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Sensory  Information  Processing 


Thl a research  uses  digital  computation  to  investigate 
processes,  both  linear  and  nonlinear,  for  the  filtering, 
restoration,  snhancsmsnt,  bandwidth  reduction,  distortion 
Immunization  and  analyst  of  both  visual  and  auditory  information. 


Section  1-3  reports  additional  results  in  image  deblurring 
that  extend  results  reported  in  the  last  semi-annual  report  (I  July 
74  - 31  Dec  74).  Technical  reports  covering  this  work  are  planned. 
Our  initial  efforts  to  extend  these  results  into  culor  are  reported 

In  Section  1 1 -4. 

Several  Items  of  substantial  progress  are  reported  in  the 
speech  area,  notably  Sections  5,  G,  8,  and  10.  Section  5 reports 
the  results  obtained  from  basic  LPC  spectral  matching  algorithms 
that  have  been  developed  and  implemented  as  tools  for  all 
researchers  at  thl.s  installation.  Section  G reports  continuing 
results  In  Improving  quality  of  processed  audio.  Section  8 reports 
final  results  of  Isolated  word  recognition  work,  and  Section  9 
reporte  some  extentiona  contemplated  along  similar  lines.  Section 
10  reports  ongoing  results  in  methods  for  improving  speech 
Intel  I Iglbl 1 1 ty. 


Sections  7 and  11  report  results  that  have  been  in  development 
some  time.  These  Items  h«va  not  previously  been  reported,  and  have 
reached  the  state  of  having  been  documented  during  this  period. 
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Separate  ARPA  technical  report®  on  thaw  itema  are  not  now 
contemplated. 

Section  12  outlines  an  area  of  future  interest  concerning  the 
estimation  of  the  phase  of  a speech  wave. 

Symbo i i c Computa  t i on 

The  goal  of  thie  research  is  the  production  of  efficient 
programs  for  algebraic  and  symbolic  computation  related  to 
scientific  research.  The  long  range  objective  is  the  development  of 
a completely  automatic  algebraic  programming  system  which  can  be 
moved  eael  ly  from  one  computer  to  another. 

Section  1 reports  continuing  progress*  in  mode  analyzing  for 
REOUCE.  Section  2 reports  ongoing  results  to  improve  the  IRN  160 
LISP  which  ie  basic  to  the  REOUCE  transportability.  Section  3 
reports  continuing  efforts  to  improve  implementation  of  space 
mairicee  and  factored  polynomial  algebra.  Specific  new  packages 
that  have  been  implemented  are  described  in  Section  4. 

Graph ice 

Thle  research  effort  plane  to  complete  the  construction  and 
demonstration  of  a system  uhich  makes  shaded  pictures  of  computer 
models  of  three-dimensional  illuminated  objects,  and  to  conduct 
research  to  find  substantially  improved  modeling  techniques  for 
dynamically  changing  object  collections. 
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Several  items  of  research  reached  fulfillment  during  the  period 
as  reported  in  Sections  1,  2,  and  4.  Related  technical  reports  are 
In  process.  Section  3 Indicates  ongoing  work. 


II.  RESEARCH  ACT1 IVI IT1 IES 
SENSORY  INFORMATION  PROCESSING 


Section  1 

Restoration  of  Blurred  Images 
Brent  Baxter 

This  section  discusses  the  continuation  of  work  reported  under 
the  eame  title  In  the  previous  semi-annual  report  (1  July  1974  to  31 
December  1974).  That  report  noted  in  some  detail  a model  for 
predicting  brightness  perception  which  was  able  to  adapt  to  the 
epacial  frequency  content  of  an  image.  A little  reflection 
euggested  a elmllarity  between  blurred  photographs  and  images 
containing  week  high  frequency  texture,  and  a system  capable  of 
accentuating  weak  texture  was  devised  which  proved  euperior  to 
previous  methods  for  restoring  blurred  photographs. 

1.1  Restoration  of  Blurred  Images 

The  ideae  upon  which  this  section  is  based  first  came  to  mind 
by  observing  that  each  AGC  element  in  the  frequency  selective  model 
has  associated  with  it  a criterion  for  determining  when  the  signal 
in  Its  channel  is  too  weak  and  must  be  amplified.  Thi9  means  that 
the  local  power  spectrum  * is  being  adjusted  to  some  previously 
specified  prototypical  value.  These  Ideas  led  to  the  deblurrlng 
mechanism  in  Figure  1.1. 


* The  idea  of  a local  power  spectrum  is  similar  to  the 
short  time  spectrum  on  which  speech  spectrograms  are 
based. 


The  method  will  be  described  as  concisely  as  possible  despite 
the  fact  that  In  certain  respects  its  mathematical  foundations  are 
rather  subtle.  Following  thi s discussion,  some  comparisons  between 
the  deblurrlng  method  and  the  frequency  selective  mode  I will  be  made 
which  suggest  a limited  deblurrlng  capability  in  human  vision. 

Blurring  caused  by  camera  motion  or  an  out  of  focus  lens  are 
processes  that  combine  a clear  image  with  a blur  impulse  response  by 
convolution.  An  out  of  focus  blur  has  a cylindrical  impulse 
response  and  a motion  blur  has  a fence- 1 ike  impulse  response. 
Stylized  examples  are  shown  in  Figures  1.3  and  1.17.  The  process  of 
deconvolution  [1]  * is  the  process  of  removing  one  of  the  components 
(the  blur)  by  first  mapping  the  blurred  image  into  a space  in  which 
the  two  components  are  added  rather  than  convolved,  and  then  the 
blur  Information  \u  removed  by  linear  filtering  **. 

Several  problems  stand  in  ths  way  of  what  might  seem  otherwise 
to  be  a straightforward  tasks 


•ft  See  references  for  Sections  1,  2 and  3 after  Section  3. 
There  ie  a terminology  associated  with  this  style  of 
signal  processing  in  which  the  filtering  Just 
described  is  known  as  llftsring,  and  it  is  implemented 
by  multiplication  in  the  so-called  cepstral  domain. 
The  independent  variable  in  this  domain  is  called 
quefrency  and  is  dsscribed  39  being  9hort  or  long 
rather  than  high  and  low  as  is  the  case  for  ths 
frequency  domain.  Using  these  terms  Figure  1.1  is 
really  a long  pass  lifter.  For  ths  purposes  of  this 
report  the  more  familiar  terms  Fourier  transform, 
logarithm  and  filter  will  be  used  to  describe  these 
processes. 
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(1)  The  first  complication  Involves  noiee-iike 
fluctuations,  known  as  film  grain,  introduced  when  imagee 
are  recorded  on  photographic  film.  If  the  restoration  Is 
to  be  acceptable  these  random  fluctuations  muet  not  be 
amplified  sxceeelvaly  by  the  restoration  process, 

(2)  The  second  difficulty  arises  because 
epace-l imi ted  blurs  of  the  type  illustrated  above 
completely  eliminate  image  energy  at  certain  epat’al 
frequencies  making  an  exact  restoration  impossible  even  in 
tho  absence  of  fHm  grain  noise.  However,  If  there  is  no 
nolee  and  only  a few  frequencies  are  el iminated,  their 
absence  goes  almost  unnoticed.  Figures  1.2-1. 6 illustrate 
thl  •• 

(3)  The  third  problem  is  that  approximate  aperiodic 
convolutional  inverses  for  space- 1 i mi  ted  blurs  are  often 
nonzero  # over  large  domains  **. 

(4)  The  fourth  problem  is  caused  because  the  blurred 
Image  ie  truncated  abruptly  at  the  edges  of  the  film  by 
the  camera's  film  holder.  Special  edge  treatments  are 
required  to  suppress  artifacts  generated  by  these  edges. 


ft  See  the  footnote  in  Appendix  B for  an  interpretation  of 
the  term  nonzero. 


See  Appendix  A for  an  illustration  of  this  difficulty. 


n-y,  . i-  HI.  ' l i,|i.iinii|11  '.ini:,  n.  . ■■  i ■■■  , 
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Item  mumber  three  can  be  particularly  troublesome  because 
restoration  filters  for  use  uith  aperiodic  convolution  often  have 
very  long  Impulse  reeponsee,  and  arbitrarily  truncating  them  for 
compuatlonal  convenience  causes  severe  ghost-like  echoing  in. the 
restored  Image.  Figures  1.7  and  1.8  illustrate  this  problem  for  the 
artificial  blur.  Traditional  attempts  to  remedy  the  situation  by 
uindouing  have  proved  to  be  only  partially  successful.  See  Cannon 
12]  page  34  and  Cole  133  page  53  for  examples  of  echo- 1 ike  artifacts 
introduced  In  biur  removal  schemes  by  impulse  response  truncation. 

An  alternative  to  uindouing  is  to  take  advantage  of  the  fact 
that  both  the  blurred  input  Image  and  restored  image  are  usually  of 
finite  size  (often  the  same  size)  and  a convolutional  inverse  can  be 
truncated  uithout  introducing  echoes  provided  that  its  domain  is 
adequate  ft. 

The  approach  to  the  truncation  problem  taken  here  relies  on  the 
limited  epatial  extent  of  most  blur  impulse  responses  uhich  a I loue 
substitution  of  periodic  ft*  for  aperiodic  convolution.  The  blur 
systems  considered  here  completely  eliminate  signal  energy  at 
certain  epatial  frequencies  and  Beverly  attenuate  it  at  others 
nearby.  In  attempting  to  amplify  these  frequencies  adequately,  the 
system  ie  constrained  by  the  highpaBS  filter  so  that  noise  is  not 


ft  See  Appendix  B for  a discussion  of  constraints  on  the 
impulse  response  length  imposed  by  this  technique.  This 
approach  may  bs  impratical  because  of  the  large 
computational  effort  involved  in  obtaining  the 
convolutional  inverse  prior  to  truncation. 

ftft  Appendix  C shows  how  ths  sffects  of  circular  convolution 
may  be  obtained  by  filtering  the  log  spectrum. 


^ 1 ">T" ~ *•  M >■  ■ " ■ ' ^'  1 '"  '"J  w*n.w ■" 1 IP“"V  " Jl  " 

»iwirrniT*iwiii«n»niMfci  I ri  m - .-sfiiak^ac^-  £L:  &£  5 .;...  i. !S  ./^  , ~ ',- .,  „ ZJZ&M& 

Page  12 

amplified  excessive  I y.  The  next  four  figures  should  help  l I lustra,  e 
thie  Issue.  Success  of  this  substitution  is  due  to  the  fact  that 
both  kinds  of  convolution  give  Identical  results,  except  possibly 
near  Image  boundaries.  Note  that  results  identical  with  Figure  1.4 
are  obtained  if  Figure  1.2  is  extended  with  its  boundary  value, 
combined  with  Figure  1.3  by  aperiodic  convolution  and  the  reeult 
truncated  to  the  original  size. 

Figure  1.9  shows  the  frequency  response  associated  with  the 
artificial  blur  referred  to  earlier.  Frequencies  where  this 
function  is  zero  are  the  same  ones  where  its  logarithm  (Figure  1.10) 
increases  negatively  without  bound.  It  is  the  task  of  the  highpass 
filter,  Figure  1.11,  to  remove  as  much  of  this  signal  as  possible 
while  preserving  regions  of  large  negative  vaiue  to  prevent 
excessive  amplification  of  noise.  In  Figure  1.12,  the  blur  log 
spectrum  a haa  bean  almost  entirely  removed  except  In  the  region  of 
potentially  noise  dominated  frequencies.  This  has  the  effect  of 
restoring  the  Image  while  preventing  excessive  amplification  of 
no i se. 

This  same  filtering  effect  is  obtained  if  image  information  is 
present  along  with  the  blur,  but  part  of  the  image  is  removed  by  the 
filter  and  must  be  reinserted  in  the  form  of  an  equalization  signal 
like  the  one  in  Figure  1.13.  An  equi  I izatlon  signal  which  is 
properly  prepared  can  be  used  in  restoring  a variety  of  blurred 
images.  One  way  to  prepare  an  equalization  signal  is  to  lowpass 


•ft  Log  spectrum  is  used  to  mean  the  real  part  of  the 
compl.ex  logarithm  of  the  discrete  Fourier  transform. 
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filter  the  log  spectrum  of  a similar,  unblurred  Image  uelng  a filter 
whose  frequency  response  Is  related  to  that  of  the  hlghpaee  filter 
ae  fol lousi 

LPF  ( 03  ) - 1 ( 0) ) - HPF  ( 03 ) 

Filters  of  this  type  are  called  complementary  filters. 

Phase  information  of  the  blur  eystem  is  not  preeent  in  Figure 
1.10  or  Figure  1.12  and  no  method  of  removing  it  hae  been  described 
eo  far.  In  the  caee  of  the  artificial  blur,  the  phase  ie  known 
exactly  and  eubtracting  it  is  straightforward.  For  real  blur 
systems  encountered  In  the  field,  a method  euch  ae  the  one  by  Canr.on 
[21  may  be  used  to  estimate  the  phase  of  the  blur  after  which  It  may 
be  subtracted.  Figure  1.14  shows  the  phase  associated  with  the 
artificial  blur. 

In  Figure  1.15,  the  artificially  blurred  image  ie  reetored 
ueing  the  techniquee  described  above.  There  are  two  reasone  why  the 
retoration  is  not  Identical  with  the  one  in  Figure  1.6.  Flret,  the 
effects  of  noise  are  anticipated  but  none  is  preeent.  Thle  capacity 
for  dealing  with  noise  ie  not  needed  but  it  constrains  tKs 
restoration  as  if  it  were  needed.  Second,  the  exact  details  of  the 
blur  log  spectrum  wars  unknown  to  the  restoration  system.  This  Is 
known  as  "blind  deconvolution." 

Applying  theee  ideas  to  images  blurred  in  the  field  requiree  a 
special  edge  treatment  to  make  the  image  have  the  eame  value  along 
its  boundary  ae  ehown  in  Figure  1.19.  The  blurred  image  i«  treated 
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ae  though  It  has  been  replicated  periodically  and  a special 
interpolation  process  * is  invoked  near  image  boundaries  to  simulate 
periodic  blurring  across  the  boundaries.  In  this  setting,  queetione 
regarding  convolutional  inverses  having  large  spatial  extent  do  not 
arise  since  the  Inverse  Is  periodic  and  must  be  determined  cnly  for 
a single  period.  If  the  edge  treatment  is  successful,  no  further 
attention  is  required  with  regard  to  problem  four  (truncation  after 
blurring) . 

Flguree  1.16-1.22  give  an  idea  of  the  degree  to  which  a 
successful  restoration  of  images  blurred  in  the  field  is  possible 
using  this  method.  Note  how  the  small  text  in  Figure  1.19,  which 
was  blurred  beyond  legibility,  has  been  made  legible  in  the  restored 
Image  of  Figure  1.21.  Also,  the  outline  of  the  sign  is  much  more 
clearly  defined  in  both  Figures  1.18  and  1.21.  The  speckled 
appearance  of  Figures  1.18  and  i.21  is  due  to  amplification  of  film 
grain  noise  and  may  be  traded  for  sharpness  by  adjusting  the 
hlghpaso  filter  cutoff  frequency.  A minor  artifact,  barelu 
perceptible  In  Figure  1.21,  is  a shadow  manifesting  Itself  as  a dark 
copy  of  the  letter  "c"  to  the  left  of  the  word  convention.  This  may 


tit  The  Interpolation  process  consists  of  the  following: 


a.  Removing  the  average  value 

b.  Multiplying  the  image  by  unity  everywhere  except 

near  edges  and  there  by  half  of  a Hanning  window. 

Figure  1.23  illustrates  this  windowing  operation. 

c.  Reinserting  the  average  value. 


This  scheme  simulates  blurring  across  image 
boundaries. 
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be  due  to  regularly  spaced  harmonica  that  are  missing  from  the 


restored  image  or  perhaps  errors  in  estimating  the  phase  of  the 


blur. 


Several  alml lari tlee  between  the  frequency  eelectlve  model  cf 


vision  (Figure  1.24)  and  the  deblurring  mechanism  (Figure  1.1) 


suggest  a iimited  deblurring  capability  in  human  vision.  The 


Fourier  traneform  ie  often  naively  understood  in  terms  of  a bank  of 


bandpass  filters  similar  to  those  in  Figure  1.24,  a principal 


difference  being  the  lower  frequency  resolution  of  the  filtere  in 


the  visual  model.  The  AGC  elements  in  Figure  1.24  were  implemented 
using  Stockham’s  method  [101  for  separating  multiplied  signal e and 


ie  similar  to  the  logarithm,  highpass  filter,  and  exponential  in  the 


deblurring  mechanism.  One  may  view  the  equalization  signal  of 


Figure  1,1  as  providing  a fixed  gain  for  each  frequency  channel. 


There  are  Only  two  places  where  the  analogy  breaks  down.  There  is 


no  logarithmic  etage  operating  on  the  blurred  image  prior  to  the 


Fourier  traneform  in  Figure  1.1,  and  the  visual  model  has  no 


provision  for  correcting  phase  reversals  introduced  by  the  blur. 


For  Images  of  low  dynamic  range,  such  as  the  sign  in  Figure  1.16, 


the  logarithm  ie  approximately  linear  and  its  absence  is  not 


critical.  Faliure  to  correct  phase  reversals  Is  a significant 


difference  between  the  two  systems  and  ie  a major  limitation  on  the 


ability  of  ths  visual  system  to  restore  blurred  images. 


The  existence  of  this  deblurring  capability  In  human  vision 


might  help  explain  why  it  is  difficult  to  m?ke  improvements  in 


blurred  photographs  that  are  as  striking  as  might  be  expected  from 
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physical  considerations. 


1.2  Summary 

In  the  previous  semiannual  report,  experimental  evidence 
favr'ing  a frequency  selective  model  of  brightness  perception  has 
been  described  with  emphasis  on  the  model’s  texture  equalization 
property.  This  property  follows  from  eloctrophysiologica'  as  well 
as  psychophysical  experiments  and  forms  the  basis  of  an  enhancement 
procass  where  changes  in  contrast  ars  made  on  the  basis  of  texture 
strength  as  wall  as  texture  size.  The  eucceeeful  method  for 
retnovlng  unknown  photographic  blurs  based  on  this  property  described 
here  in  very  similar  in  Its  organization  to  that  of  the  visual 
model.  As  a result  of  this  similarity,  a limited  capacity  for 
restoring  blurred  images  is  claimed  for  the  visual  system. 

In  the  following  section,  a class  of  edge  related  Illusions  is 
considered  in  relation  to  a rather  different  model  of  brightness 
perception. 


I 


Convolutional  inverse  modified  to  avoid 
infinite  gain  at  spatial  frequencies  where 
the  blur  system  has  zero  gain. 


% 


Figure  1.6  --  Artificial  image  restored  by  inverse  filtering. 


Figure  1.7  --  Artificial  image  restored  using  a convolutional 
inverse  truncated  to  half  its  original  size. 
Truncation  caused  large  errors  in  the  average 
value  requiring  manual  corrections  to  make  the 
image  printable. 


— >15-  - f 


Sign  blurred  by  camera  motion 


nterpolation  function  used  to  make  the 
lurred  image  of  Figure  1.16  have  a constant 
alue  at  the  boundary. 
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Figure  1.24  -- 


The  frequency  selective  model  of  vision 
separates  an  image  into  channels  based  or 


itial  frequency  content  and  does  inde- 


pendent processing 


on  each  channel 
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Sectlon  2 

The  Modified  Retlnex  Model 
Brent  Baxter 


2.1  Introduction 

An  alternative  to  the  multiplicative  model  which  explains  the 
visual  system's  ability  to  reject  illumination  effects,  involves  a 
kind  of  edge  detector  mechanism  first  proposed  by  Land  [4,51.  Lard’e 
model  correctly  predicts  the  Cornsweet  illusion  but  does  not  readily 
land  Itself  to  Image  processing  because  it  is  not  suitable  for  use  on 
two  dlmene’onal  sampled  functions.  A modification  described  In 
connection  with  the  color  constancy  experiment  overcomee  thie 
difficulty. 

2.2  The  Retlnex 

The  term  retlnex  was  intended  by  Lend  to  convey  the  impression  of 
a neural  network  in  the  retina,  optic  nerve,  and  vieual  cortex  for 
extracting  lightness  * information  from  images.  To  do  thie,  ratios  of 
light  Intensity  are  formed  at  adjacent  points  along  a closed  path  and 
If  a ratio  is  different  from  unity  by  only  a email  amount,  It  Is  sat 
equal  to  unity.  To  extract  lightness  information  from  an  image,  the  *'j> 
ratios  are  multiplied  together  around  the  path.  The  region  crossed  by 
the  path  corresponding  to  the  largest  accumulated  product  ie  assumed 
to  be  white  and  Is  given  an  arbitrary  value  against  which  the 


fir  Lightness  la  Land’s  term  for  the  psychophysical 
correlate  of  reflectance. 
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lightness  at  other  pointe  ie  compared.  Necessary  and  eufficient 
conditions  for  lightnest  information  computed  in  this  wau  to  be 
Independent  of  the  path  are,  that  the  image  consist  of  patches  each 
having  uniform  reflectance,  and  that  the  path  croee  the  eame 
reference,  white,  patch.  Since  in  Land’s  formulation  the  path  ie 
arbitrary,  computational  complexity  ie  introduced  which  makes  the 
extraction  of  lightness  quite  difficult.  A modification  shown  in 

Figure  2.1  overcomes  this  problem. 

It  ie  easy  to  see  how  such  a system  might  reject  information 
about  the  average  level  of  illumination  since  all  lightness  values  are 
scaled  in  a way  that  assigns  reference  white  a predetermined  value. 
Lightness  values  thus  depend  on  ratios  of  reflected  light  rather  than 
on  the  Illumination  level.  Gradual  changes  in  illumination  across  an 
Image  are  also  rejected  by  the  retinex  because  light  ratios  within  a 
patch  will  be  close  to  unity  and  the  threshold  will  set  the  ratio 
equal  to  unity.  This  eliminates  any  information  about  gradual 
Illumination  gradients.  The  color  constancy  experiment  of  Section  2.3 
illustrates  both  of  these  properties.  The  new  implementation  of 
Land’s  retinex  consists  of  taking  the  logarithm  and  applying  a 
threshold  to  the  magnitude  of  the  gradient  rather  than  applying  a 
threshold  to  ratios  around  a closed  path.  This  permits  computations 
to  be  carried  out  on  a rectangular  grid  in  a systematic  way. 
Integrating  the  gradient  may  be  done  to  within  an  additive  constant  by 
an  appropriate  ft  line  integral,  and  the  constant  ie  supplied  in  the 
process  of  assigning  a predetermined  lightness  value  to  reference 
white.  A color  constancy  experiment  will  be  described  next  which 
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I I luetrateo  how  the  retinex  can  extract  lightness  information  in  the 
presence  of  Illumination  gradients. 

2.3  A Color  Conetancy  Experiment 

Our  Interest  In  this  experiment  stems  from  the  ability  of  visual 
system  to  perceive  colored  objects  in  correct  color  relationship  to 
each  other  even  under  widely  varying  conditions  of  illumination.  For 
example,  a blue  necktie  looks  nearly  the  same  indoors  under  yellowish 
Inuadcecent  lighting  ae  it  does  outdoors  under  much  bluer  natural 
light.  This  ability  is  known  as  color  constancy.  Tha  retinex  allowe 
for  thie  type  of  change  in  illumination  and  it  can  aleo  adjust  for 
gradual  ehlfte  in  illumination  hue  across  an  image. 

To  teet  theee  Ideae,  a board  covered  with  patches  of  colored 
paper  was  photographed  in  light  from  three  slide  projectors,  each  one 
fitted  with  either  a red,  green,  or  blue  filter.  The  red  projector, 
located  upward  and  to  the  ieft  of  the  board,  supplied  about  twice  ae 
much  light  to  the  upper  left  corner  as  it  did  to  the  lower  right 
corner,  the  green  projector  was  positioned  to  supply  about  twice  ae 
much  light  to  the  lower  right  as  to  the  upper  ieft,  and  the  blue 
projector  illuminated  the  board  evenly.  These  lighting  conditions 
resulted  in  the  pronounced  red-green  shift  of  Figure  2.3.  Note  how 
on*  of  the  white  patches  of  Figure  2.3  appears  pink  and  the  other 
eeeme  green.  The  vlaual  system  does  not  correct  an  illumination  eh)  ft 


* In  the  color  constancy  experiment  an  average  of 
integrals  taken  over  many  paths  of  integration  was 
used  to  minimize  the  effects  of  noise.  See  Wy I » e 111] 
for  a discussion  of  line  integration. 
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of  this  type  on  a reflection  print  as  well  ae  it  would  on  a projected 
image  in  a darkened  room.  The  reason  is  that  clue3  from  the  book, 
paper  etc.  tend  to  Inhibit  the  process.  The  red,  green  and  blue 
component e of  Figure  2.4  were  digitized  separately  and  each  one  wae 
passed  through  the  retinex.  In  Figure  2.5  the  lightnese  information 
ie  displayed.  Note  Its  similarity  to  Figure  2.2. 

Natural  scenes  often  consist  of  highly  textured  areae  rather  than 
the  cartoon  style  patches  prepared  for  this  experiment.  The  retinex 
does  not  work  well  on  tsxturs  end  this  is  a serious  defect.  The 
reason  le  that  Individual  patches  in  3 textured  image  may  be  email 
enough  to  occupy  a single  picture  element,  and  for  the  threshold  to 
work  properly,  edges  located  by  the  gradient  must  not  be  too  cloee 
together. 

2.4  The  Cornaueet  Illusion 

Figure  2.6  depends  for  Ste  effect  on  an  abrupt  change  in 
reflectance  next  to  a gradual  one.  The  abrupt  change  eeeme  to  be 
preserved  by  the  visual  system  and  the  gradual  one  Is  attenuated. 
This  property  of  the  visual  system  has  been  taken  advantage  of  for 
many  years  oy  artists  and  draftsmen  in  order  to  make  a region  in  a 
drawing  eeem  lighter  or  darker  that  it  would  otherwise.  Ratliff  163 
givee  an  Interesting  review  of  this  practice.  By  a coincidence,  thie 
ie  the  only  illusion  of  the  collection  which  is  predicted  correctly  by 
the  retinex  and  ignored  by  the  frequency  ss  I active  model.  Figure  2.7 
contains  plots  showing  how  lightness  information  may  be  extracted  from 
the  Images  In  Figure  2.6.  Note  that  the  gradual  change  in  intensity 
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hae  been  completely  eliminated  while,  the  abrupt  changes  are  preserved. 


2.5  Summary 

The  retinex  model  proposes  to  explain  how  lightness  Information 
may  be  extracted  from  Images  by  computing  ratios  of  reflected  light 
Intensity  at  adjacent  points  and  retaining  oneB  near  the  boundaries  of 
objects.  Neurophysiological  studies  have  failed  to  reveal  an 
organization  of  the  type  proposed  by  Land,  but  edge  Information  is 
known  to  be  important  to  perception  [123.  The  color  constancy 
experiment  of  Section  2.3  shows  how  the  retinex  is  capable  of  removing 
illumination  gradients  and  Figure  2.5  shows  retinex’  b correct  response 
to  the  Stimulus  which  produces  the  Cornsweet  illusion. 
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Section  3 


Conclusions 
Brent  Baxter 


3.1  Review 


The  new  frequency  selective  model  of  brightneee  perception 
described  in  the  previous  semiannual  report,  hae  been  shown  to  give 
subjectively  useful  information  greater  impact  where  It  ie  conveyed  by 
weak  texture.  Properties  similar  to  those  of  the  multiplicative  model 
are  retained  which  help  subdue  undesirable  Illumination  effects,  and 
the  apatlal  filters  acting  In  concert  with  a bank  Lf  AGC  elements  help 
overcome  a limited  degree  of  blurring.  The  model  is  the  baeie  for  a 
significantly  Improved  method  for  restoring  blurred  photographs  that 
gives  reeulte  free  from  echo-like  artifacts  common  to  earlier  methode. 
Restorations  of  Images  blurred  in  the  field  bu  both  an  out  of  focue 
lens  and  by  camera  motion  were  presented  to  demonstrate  the  method. 
In  Section  2,  a new  way  of  implementing  the  retinex  was  Illustrated 
using  a color  constancy  experiment  as  an  example. 


3.2  Speculation  About  Future  Research 


Phye  1 o I og I ca I.  evidence  is  quite  convincing  that  certain  neurone 
In  the  visual  cortex  respond  to  different  parts  of  the  two  dimensional 
Fourier  transform  of  an  image.  This  was  illustrated  by  the  adaptation 
experiment.  The  AGC  elements  are  an  attempt  to  model  the  adaptation 
part  of  the  demonstration,  however  thay  were  designed  to  make  the 
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image  enhancement  work  properly.  This  raises  the  question  of  whether 
they  could  be  cal.ibrated  by  an  experiment  similar  to  the  one  used  by 
Baudelaire  17]  in  calibrating  the  multiplicative  model.  Another 
question  which  suggests  itself  from  ths  appearance  of  Figures  3.1,  3.2 
and  3.3  Is  whether  this  enhancement  process  could  be  used  to  make 
images  more  immune  to  the  corrupting  effects  of  coding.  The 
multiplicative  model  has  been  used  successfully  for  this  purpose  [8,9] 
and  the  AGC  processing  should  bring  about  a further  Improvement.  This 
would  be  a step  forward  with  regard  to  finding  a more  accurate  measure 
of  image  distortion. 

Many  questions  about  the  image  restoration  process  remain 
unanswered.  One  of  them  is  ths  relationship  between  the  cutoff 
frequency  of  the  filter  (Figure  1.11)  and  the  quality  of  the 
restoration.  Circularly  symmetric  filters  were  used  in  the 
restorations  reported  here  but  it  seems  possible  that  improved  results 
would  be  obtained  if  the  symmstry  depended  on  the  type  of  biur 
(motion,  out  of  focus  etc.).  Also  the  preparation  of  equalization 
signals  should  be  investigated  further. 


Natural  scene  containing  variable  strength 
texture . 


Enhancement  of  Figure  3.1  based  on  the 
multiplicative  model.  The  apple  blossoms, 
which  were  quite  prominent  in  the  original 
(Figure  3.1),  are  further  accentuated  by 
the  enhancement  operation. 


Figure  3.3 


--  Image  enhance  .ant  based  on  the  frequency 

selective  model  results  in  a kind  of  texture 
equalization  where  weak  detail  in  the  dome 
and  sky  are  strongly  accentuated  relative  to 
corresponding  parts  of  Figure  3.1  while  the 
apple  blossoms  are  given  a softer  rendition. 
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APPENDIX  A 


Notee  on  the  impulse  response  length  of 
convolutional  inverses  for  space  limited 
blurs. 


Deconvolution  ie  the  process  of  separating  signals  that 
have  been  combined  by  convolution  and  it  ie  ordinarily  done 
ultri  the  aid  of  the  Fourier  transform. 


a > (a©b)  ©b'1-*— -- ►(A  # B)l/B 

Suppose  a signal  Is  added  to  a del  ay  ad  and  scaled  copy  of 
iteelf.  A convolutional  representation  of  this  situation  is, 


a © b - a(t)  + c*a(t  - T) 


(O 


IT 


The  convolutional  Inverse  of  b can  bs  found  by  inspection  and 
Is  an  infinite  train  of  impulses. 


Note  that  for  c-1  the  Impulses  do  not  tend  toward  zero  for 





Page  48 


large  X . 


A motion 
representation 


blur  imp  I use 
much  like  the 


response  has  a sampled 
case  for  c-1  and  its  discrete 


convolutional  inveree  is  of  infinite  extent.  Truncating  euch 
a sequence  will  introduce  errors  in  the  deconvolution. 
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APPENDIX  B 


Notes  on  truncation  of  discrete 
Impulse  response  filters 

Suppose  two  sequences,  one  of  which  is  zero  outside  a 
finite  Interval,  are  to  be  convolved  aperiodical ly. 


g ( J ) ■ £ f ( i ) tVh  ( J - 1 ) 

i=-°° 


where,  f(i)  • 0 for  i<k  and  i>l*  K<l 


If  h( j)  Is  nonzero  * over  the  infinite  interval  » <j<  » the  eum 
above  must  be  taken  over  all  i and  g(j)  will  also  be  nonzero 
over  the  Infinite  interval. 

Computational  difficulties  implied  by  the  infinite  sum 
may  be  avoided  if  g(j)  is  only  required  over  part  of  its 
domain  as  In  image  processing  where  the  output  image  is  often 
made  equal  In  size  to  the  input  image.  If  g(j)  muet  be 
computed  only  for  m<-j<«n,  then 

1 

g( J)  - I f ( l ) *h ( J - 1) 


l_K+i  terms  are  required  in  the  9um,  the  sum  must  be  evaluated 
n-m+1  times  and  h ( j ) mu3t  be  available  for  m-l <« j<-n-k. 
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If  the  lengths  of  input  and  output  sequences  are  l-k+1 
and  n-m+l,  the  impulse  response  must  be  m-l+n-k+1  points  long. 
For  example,  if  Input  and  output  sequences  are  25S  points 
long,  511  points  of  the  infinite  Impulse  response,  n(J),  will 
be  •'equlred.  Similar  arguments  apply  in  two  dimensions. 


ft  By  the  expression  nonzero  over  an  interval  it 
le  meant  that  elements  of  the  sequence  may  be 
nonzero  ulthln  the  Interval  but  they  are  zero 
outstde  the  interval. 


SAMUS 
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APPENDIX  C 

Circular  deconvolution  of  periodic 
••quinces  by  filtering  their  log 
spectra 

Periodic  sequences  combined  by  circular  convolution  may 
be  separated  using  the  complex  logarithm  v<  and  the  convolution 
property  of  the  discrete  Fourier  transform  (DFT) . If  a,b  are 
sequences  having  the  same  period  and  A,B  are  their  discrete 
Fourier  transforms,  then 

, FT  Log 

a - (a  ® bl  © b ■♦—►(A  * B)/B $ + E)  - f - 

The  square  overbracket  Indicates  the  complex  logarithm. 

B may  be  removed  by  direct  subtraction  if  it  is  known 
exactly  or  by  linear  filtering  if  its  DFT  is  disjoint  from  the 
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APPENDIX  D 

An  Image  Processing  Language  (I PL) 

Experiments  or  the  type  described  herein  were  greatly 
facilitated  by  the  development  of  a programming  larguage  C29) 
whose  data  typee  are  images  and  whose  primitives  operate  on 

imagee.  Typical  IPL  primitives  al low  manipulation  of  imagee 

through  addition,  the  discrete  Fourier  transform  and 
magnitude-phase  computations  to  name  but  a few.  As  a further 
experimental  convenience,  these  operations  may  be  Invoked 
separately  as  a command  language  by  typing  interactively  on  a 
computer  terminal  or  they  may  be  combined  into  an  IPL  program 
and  interpreted  by  the  language  processor.  The  convenience 
provided  by  thie  approach  makes  it  feasible  to  explore  many 

more  poeeibi  I i ti  es  than  if  each  experiment  required  that  a 

separate  program  be  written,  compiled  and  debugged  using 
traditional  editors,  compilers  and  debugging  aids. 


i 
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Section  4 

Color  Image  Processing  in  the  Context  of  a 
Three-D  i mens  Iona',  Homomorphic  Model 
Olivier  Faugsras 

Ue  have  been  working  for  the  last  six  months  on  a model  of 
human  vision  which  extends  Stockham's  model  tl33*  to  include  color 
phenomena  and  can  also  include  the  new  Ideas  involving  the  existence 
of  separate  frequency  channels  in  our  brightness  vision  system. 
Although  the  old  [12,2]  and  new  C33  results  concerning  black  and 
white  vision  have  been  constantly  used  in  our  research,  our  major 
effort  has  been  directed  toward  an  understanding  of  problems  related 
to  color.  The  model  which  takes  into  account  the  most  recent 
knowledge  about  the  physiology  of  cclor  vision  Is,  in  its  oldest 
version,  a three-dimensional  homomorphic  system  and  is  thus  built  on 
Ideas  this  group  haa  been  familiar  with  for  a long  time  and  have 
been  shown  to  be  successful  and  fruitful. 

To  get  straight  on  notations,  let  us  review  some  related 
mathematics:  a color  image  can  be  represented  as  a function  I of 

three  variables  x,  y,  AA  where  x and  y are  the  spatial  coordinates 
and  A Is  the  wavelength  of  the  light  reflected  from  the  print. 

To  explain  the  results  of  trichromatic  matching  experiments,  it 
has  been  hypothesized  that  the  human  retina  has  three  types  of  cones 
with  three  different  absorption  curves.  This  hypothesis  has 
recently  been  confirmed  by  reflexion  densitometry  measurements  in 
the  living  eye  of  normal  man  [13  and  by  absorption  spectrum 


* See  references  at  the  end  of  this  section. 
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measurements  of  single  cones  in  excised  retinae  from  man  and  monkey 
[91. 

The  effect  of  this  cone  absorption  can  be  modelled  by  the 
following  equations: 

J1(x,y)  - / I (x,y,x)a1(x)dx  ' - 1.2,3  ID 


where  the  integral  is  taken  over  the  visible  spectrum  and 
where  a^X)  (i  - 1,2,3)  is  the  absorption  spectrum  of  the  i th  type 
of  cone. 

Tha  naxt  fact  to  be  considered  In  the  model  Is  that  the  totai 
cone  response  Is  not  linear  but  a monoton i cal  I y increasing  convex 
function  which  can  very  well  be  approximated  by  a logarithm  function 
[4)  [51 : 

Dl(x,y)  - LogU^x.y)]  i - 1,2,3 

The  most  recent  physiological  studies  [73  [53  have  also  shown  that 
after  those  two  stages  lateral  inhibition  was  present  between  cones, 
fact  that  we  can  model  by  the  cascade  of  two  systems: 


— ona  amnesic  linear  system  defined  by: 


(x,y)  - a*D1(x,y) 

< G2  (x,y)  - |J  *(d2(x,y)-d1(x,y)) 

G3  (x,y)  - Y*(D3(x,y)-D1(x,y)) 

»n 

where  <*,£,  Y are  three  carefully  chosen  constant  numbers  (Frei). 


> 
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— One  linear  apace  Invariant  system  with  memory,  defined  by 
three  impulse  responses  h ^ (x ; y) , h2(x,y),  h^ix.y)  or  equivalently  by 
three  frequency  responses  ( f 1, f 2) • H2<f1*f2>*  H3(f1. f2)  • 

T£(x,y)  ■ G^fth^  (x,y)  i <•  1,2,3  (4) 

ft  denoting  two-dimensional  convolution 

At  this  point  • block  diagram  of  the  total  model  might  hulpt 


Step  1 


Step  2 


Step  3 
Figure  1 


Step  4 


The  vector  T(x,y)  can  be  considered  as  the  representat ion  In  some 
three-dimensional  "perceptual  space"  of  the  original  Image  I(x,y,A). 


It  must  be  pointed  out  that  all  steps  but  one,  namely  the 
firet,  in  this  mode1  are  invertible.  This  non-invertibi  I i ty  ie  not 
really  going  to  hamper  us  since  it  has  been  known  since  Helmholtz 
that  any  image  can  be  represented  in  terms  of  only  three  primaries 
(red,  green,  blue  for  example).  This  indicates  that  although  the 
first  transform  is  destroying  almost  all  wavelength  Information, 
thle  fact  Is  of  no  conse"  lence  at  all.  Actually,  for  the  purpose  of 
our  work,  we  might  as  well  entirely  ignore  it  and  consider  that  the 
Images  wa  are  working  with  are  defined  as  three  images  or 
equivalently  as  a vector  I(x,y),  the  three  components  of  which 


Page  56 


correspond  to  the  red,  green  and  blue  contents  of  the  original 
picture. 

The  cone  abeorptlon  transform  now  becomes  a eimple  amnesic 
linear  transform  corresponding  to  a change  «f  coordinate  system. 
The  change  is  from  the  one  used  to  obts'sn  the  three  separation 
prints  to  the  one  defined  by  the  three  cones  absorption  curves,  thus 
making  the  whole  system  entirely  invertible. 

Before  going  Into  the  work  we  have  been  doing  with  this  model, 
let  ue  review  a few  more  interesting  facts  concerning  its 
relationship  to  previous  and  recent  models  for  biack  and  white 
vision  that  have  been  successfully  used  in  this  group;  It  can  be 
shown  that  one  cone  absorption  curve  is  very  closely  approximated  by 
the  CIE  relative  luminous  efficiency  function  and  thus  it  is  built 
into  the  model  that  the  brightness  information  is  present  in  only 
one  channel  (we  chose  the  first  one  or  the  first  coordinates)  of 
every  vector;  this  brings  up  the  fact  that  in  the  form  of  Figure  4.1 
and  for  a black  and  white  image  (for  which  all  three  componente  of 
J(x,y)  are  equal)  G(x,y)  is  of  the  form  EG^  (x, y)  , 4>3  where  G^x.y) 
is  the  brightness  information.  Thus,  the  frequency 
response  H^If^.f^)  ^nown  U8  anc*  can  *a^en  38  one  ueec* 
by  Stockham  C131  or  as  the  one  measured  by  Baudelaire  121. 

In  this  sense,  this  model  can  be  considered  as  an  extension  of 
Stockham’ s.  Also,  since  all  brightness  information  is  contained  in 
Channel  1,  it  can  readily  be  seen  that  more  recent  results 
concerning  the  existence  of  separate  frequency  channels  [3]  are 
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eael  ly  Included  by  juet  acting  on  Channel  1 after  Step  3. 

We  are  now  free  to  concentrate  our  attention  on  Channele  2 and 
3 which  correspond  to  the  chromatic  information  and  especially  on 
the  two  frequency  responses  H 2 ( f lf f 2)  and  H3(f1,f2)»  Very  little 
work  hae  been  dona  to  measure  those  modulation  transfer  functions 
and  It  la  one  of  the  purposes  of  our  Study  to  contribute  to  thie 
measurement  and  ehow  that  some  phenomena  related  to  color  illueione 
that  hava  puzzled  people  for  long  (color  constancy,  color  contraet) 
can  be  explained  by  thie  model.  Another  purpose  of  thie  work  le  to 
ehow  that  processing  similar  to  the  one  performed  by  Stockham  on 
black  and  white  images  udn  also  be  done  in  color.  Finally,  a third 
purpose  le  to  study  the  effects  o*  a visual  fidelity  criterion 
defined  by  the  model  for  the  encoding  of  color  images. 

It  must  be  pointed  out  that  all  of  these  purposee  are  closely 
related.  To  give  an  examples  the  human  visual  system  can  easily 
discard  chromatic  Illumination  in  a very  large  range  of  intensitlee 
(color  conitincy  effect  as  shown  by  experiments  by  Land  (83 » , *hlo 
fact  gives  us  skatchy  information  on  the  shape  of  H2  and  H3  near  the 
origin;  this  information  in  turn  can  be  used  both  to  perform 
enhancements  and  also  to  preprocess  prior  to  coding.  We  now  proceed 
to  deecrlbe  our  accomplishments. 

4.1  Color  Image  Processing 


I 


Let  us  first  review  our 
Images.  The  main  idea  is 


work  with  the  proceseing 
two-foldi  in  the  processing 


of  color 
of  a color 
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Image  we  might  want  to  do  two  things: 

--Increase  the  saturation  of  objects  present  in  the 
ecene.  Since  objects  on  a picture  tend  to  be  small,  this 
Implies  that  our  frequency  responses  H2  and  H3  should  have 
large  values  at  high  frequencies. 

--Get  rid  of  any  chromatic  illumination  that  might  be 
caused  either  by  a smalt  error  in  ths  color  balance  during 
the  printing  process  or  by  the  actual  presence  of  a strong 
chromatic  Illumination.  This  implies  that  H2  and  H3 
should  have  low  values  (less  than  1)  near  the  origin. 

Using  those  Ideas,  an  experiment  has  been  devised  to  enhance 
color  images.  Real-life  color  pictures  have  been  scanned  into  the 
computer  using  the  separation  print  method: 

A color  negative  is  used  to  produce  three  black  and  white 
prints  on  paper  exposed  through  three  different  filters  (Red  92, 
Green  99,  Blue  47B) , each  print  being  afterwards  separately  scanned 
in. 

The  results  are  sho.-:n  on  Plates  I and  II:  the  "original" 
reproduced  from  the  original  bits  stored  on  disk,  and  the  chromatic 
enhancement."  This  latter  is  obtained  by  processing  every  image 
using  the  model  of  Figure  4.1  with: 


h,  (x,y) 


<5(x,y) 


(no  brightness  procsssing) 
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H2Cfi.f2)  snd  H3(f]_,  f2)  have  been  taken  to  bo  circularly  eymmetric 
with  ths  crose-sections  shown  on  Figure  4.2. 


1.5  — 


(cycles/picture) 


Figure  4.2 


Also  shown  are  the  same  image  processed  again  by  the 
previous  H2  and  H3  but  by  taking  also  H^ff^,  f2)  cross-section  as 
shown  In  Figure  4.3. 


Figure  4.3 


This  last  processing  intends  to  show  that  results  obtained 
previously  by  Stockham  in  his  enhancement  work  with  black  and  white 
images  can  be  successfully  e <tended  if  applied  to  the  brightness 
channel  of  this  color  model  which  was  by  no  means  obvious 
beforehand.  Results  are  shown  on  Plate  III. 


_4.  "Www. 
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Thle  second  processing  may  be  called  "total  homomorphic 


enhancement. 1 


4.2  The  Use  of  a Distortion  Measure  in  the  Encoding  of  Color  Images 

Ue  also  said  before  that  we  wanted  to  investigate  the 
possibilities  offered  by  the  model  to  define  a distortion  measure 
between  etlll  color  imageB 

Let  T(x,y)  be  ou"  original  image  that  we  want  to  transmit  over 
eome  noisy  channel  and  Hx,y)  the  received  image.  Ue  would  like  to 
be  able  to  define  a distance  or  distortion  measure  between  these  two 

■V 

Images  such  that  d(t.lf)  is  in  agreement  with  subjective  evaluation. 
The  reeeon  why  ue  are  Interested  in  such  a distortion  measure  ie  the 
fclloulngt  Shannon’s  rate-distortion  function  C12I  provides  a useiul 
lower  bound  against  which  to  compare  the  rate -verBus-di star t i on 
performance  of  practical  encoding-transmission  systems  by  the 
fol  lowlngi  The  distance  d being  defined,  the  performance  of  the 
eyetem  Is  measured  by  the  average  distortion: 


dft  ■ E {d ( I • I ) (5) 

where  the  expected  value  is  taken  over  tha  ensemble  of  images  of 
Interest.  Shannon’s  rate-distortion  function  R(d*)  is  a lower  bound 
of  the  transmission  rate  required  to  achieve  average  distortion  hvj 
moreover,  Shannon's  coding  theorem  also  states  that  one  can  design  a 
code  with  rate  only  negligibly  greater  than  R { d* ) which  achieves 
average  distortion  d>v.  This  function  R(d*)  thus  exactly  specifies 
the  minimum  achievable  transmission  rate  R’required  to  transmit  an 
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image  with  average  dietortion  level  d*  and  providee  an  absolute 
yardstick  against  which  to  compare  the  performance  of  any  practical 
system. 

To  date,  this  potential  value  has  not  been  realized  for  image 
transmission  for  several  reasons.  One  reason  is  that  there  does  not 
currently  exist  any  tractable  mathematical  models  for  an  image 
source.  A second  reason  is  the  difficulty  of  calculating  the 
rate-d i etort i on  function  'for  other  than  Gaussian  sources  and 
equare-error  distortion  measures. 

However,  it  ie  generally  recognized  that  the  prime  reason  that 
rate-dl etortlon  theorg  le  not  applicable  le  that  a distortion 
measure  in  agreement  with  subjective  evaluation  of  image  quality  is 
not  known.  Since  the  model  we  are  using  embodies  the  most  recent 
knowledge  about  black  and  white  and  color  vision,  it  ie  very 
appealing  to  use  it  to  define  a class  of  distortion  measures  in 
order  to  calibrate  and  test  it  further.  It  would  then  be  poeeible 
to  compare  different  distortion  measures  in  the  class  by  simulating 
the  encoding  of  a fixed  image  at  a fixed  rate  under  different 
dietortion  measures  and  subjectively  Judging  the  quality  of  the 
encoded  Images. 

If,  for  a variety  of  source  images  and  rates  of  interest,  many 
uubjecte  uniformly  pick  images  encoded  under  the  same  dietortion 
meaaure  as  appearing  best,  then  clearly  that  measure  is  the  most 
appropriate  in  the  class  to  use  for  evaluating  transmitted  images. 
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Let  ue  now  describe  the  class  of  distortion  measures  defined  by 
the  modal i remember  we  assume  that  a color  images  ie  a 
three-dimenaionai  vector  I(x,y)  where  x and  y are  the  epatial 
coordinates  and  that  ths  model  is  a homomorphic  system  mapping 
I(x,y)  to  T(x,y).  Now  given  two  images  I(x,y)  and  i(x,y),  we  can 
define  the  distance  between  them  as  a set  of  three  real  positive 
number*  a2  a3  euch  that: 

“l  “ //CT1(x,y)-Ti(x,yi]2dxdy  1 -1.2,3  <6) 

(Notice  that  this  is  not  a distance  in  the  normal  mathematical 
sense.)  If  one  assumes  that  for  al  I Images  considered,  only  the 
eecond-order  etati sties  (mean  and  correlation  function)  are  known 
then  for  distortion  msasurss  of  the  class  we  are  considering,  the 
Gaussian  distribution  is  the  worst  in  the  class  of  all  probability 
distributions  ot  a random  field  with  given  mean  and  correla^. on 
function  (111(121.  But  this  would  not  be  of  great  interest  without 
the  following  result  that  the  optimum  code  for  the  Gaussian  source 
which  yields  average  distortion  d*  is  robust,  i.e.,  it  yields 
average  distortion  less  than  or  equal  to  d>v  for  any  source  in  the 
class  (101. 

This  solves  the  problem  of  not  knowing  the  9tati9tic9  of  the 
source.  It  al lour  us  to  compute  the  rate-distortion  function  for 
our  class  of  source  distributions  as  the  rate-di  stort  i on  function  of 
the  Gaussian  source  and  simulate  the  optimum  encoding  for  the 
Gaussian  source. 

- - - , iMrltitfiMT ifWil  II  r I 
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It  was  noted  that  the  homomorphic  model  defined  a claee  of 
di etort 1 on  measures.  By  this,  it  is  meant  that  some  parametere 
defining  the  model  were  going  to  be  made  vary.  Let  us  now  be  more 
specific  about  this  point.  So  far  we  have  only  Investigated  the 
effect  of  varying  the  two  frequency  responses  H2  and  H3.  He  did  not 
Investigate  variations  of  H since  this  has  been  done  recently  til]. 
These  results  were  closely  correlated  to  these  obtained  by  Stockham 
and  Baudelaire  In  a different  context  C133  C23 . Similarly,  we  did 
not  question  any  other  part  of  ths  model  like  stage  3 t5] , etage  2 
[11],  or  etage  1 tG]  [14] , but  this  obviously  could  be  done. 
Chromatic  frequency  responses  have  rarely  been  measured  although 
some  work  has  been  done  on  this  subject  but  never  to  our  knowledge 
In  a framework  similar  to  ourB.  Just  as  for  the  brlghtneee 
frequency  response,  the  chromatic  frequency  responses  are  assume-4  to 
be  circularly  symmetric,  but  with  the  high  frequency  rolloff  thought 
to  occur  much  sooner.  There  is  quite  a lot  of  controversy  about  the 
existence  of  a low  frequency  rolloff  similar  to  the  one  occur ing  for 
br i ghtness. 

Those  two  points  are  vary  important  for  different  reasons:  the 
early  high  frequency  rolloff  corresponds  to  the  fact  well  known  to 
televl elon  engineers  that  color  information  needs  a much  narrower 
bandwidth  than  brightness  in  order  to  be  transmitted  with  satisfying 
accuracy.  The  low  frequency  rolloff  would  account  for  such 
phenomena  as  color  constancy  and  simultaneous  color  contrast. 


j 

i 
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In  order  to  test  those  ideas,  an  experiment  has  been  designed 
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which  can  be  described  as  follows:  The  optimum  encoding  and 
tranemieeion  of  a color  image  I(x,y)  has  been  simulated  in  the  sense 
explained  above  at  two  different  rates  (.1  bits/pel  and  .05 
blts/pel),  and  for  different  frequency  responses  H2  and  H3  chosen 
from  a set  of  4 filters  (see  Figure  4.2)  which  peak  at  1.  2,  4 and  6 
eye  lee/degree.  At  both  rates,  the  pictures  looking  the  most  like 
the  original  as  far  as  color  information  is  concerned  are  the  one 
processed  through  the  model  whsre  H2  and  H3  are  peaking  at  1 or  2 
cycles/degree.  At  this  point,  it  might  be  useful  to  Insist  again  on 
the  fact  that  wo  are  exclusively  interested  in  color  information  eo 
that  in  all  these  simulations  the  brightness  information  is  left 
untouched.  These  results  strongly  suggest  a very  early 

high-frequency  rolloff  of  ths  chromatic  frequency  responses,  which 
1 8 In  strong  agreement  with  other  studies. 

The  fl  Iters  used  al  I havs  a low-frequency  rolloff.  Further 
studies  are  planned  in  order  to  explore  the  influence  of  roiloff 
characteristics.  Low  frequency  rolloff  was  investigated  first 
because  of  the  experimental  results  described  In  Part  4.3.  The 
"original"  CAR-PORT  Is  shown  on  Plate  I The  result  of  the  simulation 
at  .05  bits/pel  with  H and  H peaking  at  6 cycles/degree  is  shown 
on  Plate  IV,  and  2 cycles/degree  is  shown  on  Plate  V. 

4.3  Color  Illusions  and  Psychophysics 

Finally,  we  also  said  that  we  were  going  to  contribute  to  the 
measurement  of  chromatic  modulation  transfer  functions.  It  has  been 
shown  by  Baudelaire  (2)  that  many  brightness  illusions  such  as 
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simultaneous  contrast,  Herman  Grids,  and  ilach  Bands  can  be  explained 
by  a Ion  frequency  alternation  in  H^.  The  results  of  his  study  tihow 
that  in  the  range  0-2  cycles/degree,  which  is  the  range  of 
frequencies  where  brightness  contrast  effects  are  che  moet 
prominent,  the  following  relation  was  approximately  true: 

- 2H  (^)  ( /is  the  radial  frequency)  (7) 

This  relationship  indicates  a Btrong  low  frequency  rolloff  in  H . 
He  started  to  investigate  whether  effects  similar  to  simultaneous 

: 

brightness  contrast  were  obtainable;  the  answer  was  found  to  be  yes. 
These  effects  are  called  simultaneous  color  contrast.  Compensation 
experiments  conducted  on  the  author  have  indicated  that  a relation 
similar  to  (7)  seems  to  hold  for  H2  and  H3.  These  results  need  to 
be  made  more  precise  by  experimenting  on  several  observers  and 
standardizing  the  viewing  conditions. 

4.4  Future  Directions 

He  essentially  plan  to  go  on  with  experiments  related  to  topics 
4.2  and  4.3  in  order  to  get  more  precise  and  more  complete  results. 
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Steven  F.  Boll 


A technique  for  matching  two  digitized  waveforms  wae  developed. 
Two  waveforms  are  compared  by  using  the  predictor  coefficients  from 
one  waveform  and  the  data  samples  from  the  other  to  generate  a mixed 
parameter  error  signal.  Dividing  the  energy,  of  thie  error  signal 
by  the  minimum  error  energy,  defines  a ratio  whose  divergence  from 
one  measures  their  dieeimi larity. 


This  technique  was  applied  to  two  areas:  (1)  speaker 
authentication!  and  (2)  Isolated  word  recognition.  For  the  speaker 
verification  experiment,  the  ssntencs  "May  we  a I learn  a yellow 
lion  roar."  9B  spoken  by  tsn  speakers,  was  recorded  on  six 
occasions,  three  each  day,  spaced  a week  apart.  On  this  data,  a 
verification  accuracy  of  95%  was  obtained.  In  parallel  with  this 
effort,  was  the  application  of  this  waveform  matching  technique  to 
the  problem  of  isolated  word  recognition.  A vocabulary  of  107  worde 
recorded  nine  times  was  used.  Necessary  auxiliary  algorithms  were 


developed  for  word  boundary  detection,  non-linear  time  registration 
by  dynamic  programming,  removal  of  redundant  information  and 
Improved  raference  parameter  sets  through  averaging.  A recognition 
accuracy  of  87X  was  obtained.  Ths  theoretical  development  of  the 
waveform  matching  technique  and  details  of  the  speaker  verification 
experiment  are  described  in  University  of  Utah  Computer  Science 
Technical  Memorandum  7500,  May  S,  1975. 
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Ssctlon  6 

Improving  Synthetic  Speech  Quality  of  Low  Bit  Rate 
LPC  Vocoder  Systems 
Steven  F.  Boll 


Methods  for  Improving  low  bit  rate  synthetic  speech  quality 
were  developed  both  by  modifying  the  existing  LPC  vocoder  systems 
and  by  Investigating  new  techniques  of  speech  analysis.  Synthetic 
speech  obtained  by  linear  prsdiction  is  minimum  phase  and  uhnn 
listened  to  on  headphones  exhibits  an  annoying  buzzy  quality. 
However,  when  heard  in  room  over  I oudsooakerB  where  the  phase  has 
been  dispersed,  the  buzziness  io  Ibsb  noticeable  and  the  quality 
subjectively  better.  By  modeling  the  room  as  a linear  stationary 
system,  the  effect  of  room  r6verbsration  on  synthetic  speech  heard 
over  a headset  can  be  approximated  by  measuring  the  room’s  impulse 
response  as  recorded  by  two  microphones  spaced  the  ear’s  distance 
apart , convolving  the  synthetic  speech  with  each  of  the  impulse 
response,  and  playing  the  resulting  reverberated  speech  through  each 
headset  channel.  Preliminary  experiments  demonstrate  that  the 
annoying  qualities  of  LPC  speech  ars  noticeably  reduced  using  this 
technique.  Further  refinements  to  this  process  of  applying  binaural 
reverberat ion  are  being  considered. 

A new  technique  of  spsech  analysis  called  cepetral  prediction 
is  being  investigated.  By  applying  linear  prediction  to  the 
cepBtrum  of  the  signal,  a synthetic  waveform  can  be  generated  whose 
spectrum  matches  the  logarithm  of  ths  original  spectrum  rather  than 
the  spectrum  Itself.  The  advantages  of  log  spectral  matching  using 
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cepetral  prediction  over  standard  LPC  for  generating  high  quality, 
low  bit  rate  synthetic  speech,  are  being  investigated.  Aieo  being 
considered  is  the  possibility  of  using  this  technique  to  analyze 
speech  corrupted  by  both  additive  and  convolutional  nolee,  where  It 
wau  demonstrated  (Ml  Her,  1974)  that  homomorphic  processing 
suppresses  noise  In  acoustic  recordings. 

It  can  be  shown  that  by  ramping  the  cepwtrum  and  applying 
linear  prediction,  that  the  resulting  synthetic  spectrum  matches  not 
only  resonant  frequency  peaks  but  also  frequency  valleys  between 
peaks  as  well  as  valleys  due  to  vocal  tract  zeroB  Buch  38  found  in 
nasal s.  These  spectral  matching  properties  are  currently  being 
investigated  with  the  aim  of  incorporating  this  method  into  a 
complete  high  quality,  low  bit  rats  analysi s-aynthesl s system. 

The  theoretical  details  of  spectral  matching  by  linear 
prediction  are  given  in  the  University  of  Utah  Technical  Memo  #7500, 
May  8,  1975.  The  theoi  y supporting  the  technique  of  applying  linear 
prediction  to  the  ramp  modulated  cepetrum  wae  published  In  tho 
"Proceedings  of  the  IEEE  1975  Region  Six  Conference  on 
Communications”  by  A.  Atashroo. 
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Section  7 

Suppression  of  Noise  from  an  Audio  Signal 
Tracy  Petersen 

The  problem  addressed  here  le  the  suppression  of  additive 
broad-band  etatlona~y  noise  from  a signal  whose  spectral 
character  i et  i ce  vary  with  time.  The  solution  to  this  problem  has 
important  applications  wherever  a time  varying  signal  has  been 
polluted  with  additive  stationary  noise — possible  examples  being  the 
interpretation  of  distress  messages  or  3ignale  intercepted  from  a 
noisy  environment. 

Because  the  eignal  of  interest  varies  with  time,  it  ie 
deelreabie  to  design  a noise  suppression  filter  that  continuously 
adapts  Itself  to  the  characteristics  of  the  signal.  A major 
difficulty  In  the  realization  of  such  a filter  is  that  i te  frequency 
response  must  change  smoothly  and  in  a well  behaved  manner  between 
successive  discrete  time  estimates  of  the  optimum  noise  suppression 
f i i ter. 

An  Important  result  of  this  research  has  been  the  development 
of  a practical  method  which  facilitates  changing  the  frequency 
response  of  a filter  smoothly  in  time.  wherfj  spectral 
characteristics  are  periodically  specified  in  the  frequency  domain. 
This  filtering  echeme  represents  a new  approach  to  the  dynamic 
filtering  of  signals,  as  well  as  a new  application  of  a filter 
traditionally  used  for  speech  synthesis. 
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Linear  prediction  theory  has  demonstrated  the  effectlvenese  of 
the  lattice  form  digital  Miter  for  speech  synthesis  from  a set  of 
speech  analysis  parameters  [1,23.  An  implementation  of  this  same 
lattice  filter  has  been  developed,  which  makes  possible  the 
arbitrary  and  dynamic  filtering  of  the  signals,  where  the  frequency 
response  of  the  filter  is  modeled  as  an  all-pole  transfer  function. 
The  implementatoln  of  this  method  is  as  follows- 

In  the  linear  prediction  analysis-synthesis  of  speech,  the 
central  parameters  to  the  lattice  filter,  known  in  the  literature  as 
k-parametere  111,  are  derived  from  the  short  time  autocorrelation  of 
the  speech  signal.  The  frequency  response  of  the  lattice  filter  in 
this  case  models  the  short  time  spectral  envelope  of  the  speech.  If 
a set  of  k-parametere  are  derived  from  the  autocorrelation  of  the 
Impulse  response  of  a filter  speicified  arbitrarily  in  the  frequency 
domain  as  a prototype,  the  frequency  response  of  the  lattice  filter 
controlled  by  this  set  of  k-parameters  will  model  the  prototype 
filter  131.  Thus,  if  it  is  desired  to  change  the  spectral 
character i sties  of  the  filter  smoothly  in  time  from  an  initial  to  a 
target  configuration,  this  can  be  realized  by  first  deriving  two 
sets  of  k-parameters  corresponding  to  Ahe  initial  and  target  filter 
configurations,  respectively,  and  then  smoothly  interpolating  the 
K-parameters  In  time  from  the  initial  to  the  target  uet.  This  means 
the  lattice  filter  will  have  a new  set  of  control  parameters  at  each 
sample  point  In  time. 


A dynamic  version  of  the  Uiener  filter  h?s  been  realized  with 
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this  technique,  and  has  been  used  with  success  to  suppress  the 
surfece  noise  from  an  old  acoustic  recording  of  singing  voice.  This 
requires  continuously  updating  the  target  filter  estimate  over  short 
time  intervale.  If  filter  update  intervals  are  too  far  apart  in 
time*  eome  unwanted  noiee  will  bs  admitted  Into  the  filtered  signal. 
This  research  has  found  that  noise  suppression  from  a singing  voice 
requires  an  update  interval  on  the  order  of  150  milliseconds.  Each 
time  the  power  spectrum  of  the  filter  is  determined  ae 

$ (s+n)  -3>  (n)  _ 

4>(s+n)  w 


where  ^ ( s+n)  is  the  power  spectrum  of  the  noiey  input  eignal  from 
the  particular  time  frame  being  analyzed,  and  <Mn)  ie  the  estimated 
power  spectrum  of  the  noise  (assumed  constant).  The  current  <J>w  ia 
mapped  by  an  inverse  Fourier  transform  into  the  time  domain 
function  Rw  which  is  the  autocorrelation  of  the  impulse  response 
of  (4>  )^.  Robinson’s  method  rM  is  used  to  map  Rw  into  a new  target 
set  of  k-parameters  which  determine  the  filtering  characteristics  of 
the  lattice  filter.  A corine  curve  has  been  chosen  to  interpret 
between  initial  and  target  k-parameters.  This  avoids  instantaneous 
change  In  thu  velosity  of  the  k-parameters  causing  the  frequency 
response  of  the  lattice  filter  to  converge  smoothly  to  the  current 
target  estimate.  The  noisy  input  signal  is  supplied  ae  a driving 
function  to  the  lattice  filter  which  then  filters  the  input  signal 

h 

with  smoothly  changing  estimates  of  (*w>  . 
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Present  limitation*  on  this  dynamic  fl  I taring  technique  reside 
In  the  fact  that  the  transfer  function  Is  approximated  with  an 

al  I -pole  model.  Since  zero  information  is  not  used  explicitly  m 
the  model,  a relatively  large  number  of  poles  are  required  ehould 
narrow  frequency  band  attentuation  be  desired.  Current  work  with 
this  filtering  scheme  ha3  been  done  with  90  poles  and  a maximum 
attenuation  level  of  -24db.  It  is  bellevsd  the  Inclusion  of  zero 
Information  Into  the  approximating  transfer  function  wl  M greatly 
increase  the  filtering  efficiency  of  th i 3 nodei. 

A means  for  extracting  vh l s zero  information  Is  a subject  of 
continuing  research  within  the  Sensory  Information  Processing  Group 
at  Utah. 
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Section  8 

Demonstration:  An  I 3olated-Uord  Recognition 
System  for'Flight  Management 
Mike  Coker 

8.1  Introduction 

There  le  currently  a program  of  research  at  Ames  Reeearch 
Center  (NASA)  on  pi  lot  procedures  and  pi  lot-system  Interfaces  til. 

Briefly,  the  goal  Is  to  move  touard  automation  of  flight  management 
and  air  traffic  control  processes,  and  simplification  of  tasks 
currently  performed  by  flight  management  personnel.  The  proposed 
system  would  Include  an  onboard  computer  to  perform  low-level  statue 
apprisement,  monitoring,  and  executive  functions.  As  a medium  to 
Input  commands  and  data  to  the  system,  speech  has  been  suggested  as 
being  most  natural  and  efficient  Cl). 


Toward  the  goal  of  designing  a system  for  isolated  word 
recognition,  e modified  version  of  an  algorithm  presented  by  Itakura 
[23  hae  been  Implemented.  Results  of  experiments  performed  ueing 
thle  algorithm  indicate  that  a high  recognition  accuracy  can  be 
obtained. 


8.2  Results 


The  method  described  belcw,  together  with  a non-linear 
time-warping  scheme  Implemented  with  a dynamic  programming  algorithm 
(23,  was  initially  applied  to  a ten  word  vocabulary  consisting  of 
the  digits  0 through  5.  The  single  speaker  accuracy  achievad  wae 
100X.  During  this  experiment,  It  was  discovered  that  a high  rate  of 
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recognition  accuracy  can  be  obtained  on  these  words  ueing  far  less 
signal  information  than  would  be  required  to  reconstruct  the  speech 
with  high  intelligibility,  as  required  in  a speech 
analysis-synthesis  system. 

This  work-recogn  i t i or,  scheme  was  subsequently  applied  to  a 
vocabulary  of  107  worde,  consisting  of  the  original  ten  digite  plue 
97  flight  commands.  The  single  speaker  accuracy  achieved  varied 
from  95  to  98%  depending  on  the  number  of  training  utterances  for 
each  word.  Computation  time  was  about  twelve  seconds  per  word. 

Two  ether  vocabularies  were  also  tried.  In  the  case  of  the  ten 
digits  plus  the  alphabet,  89  to  95%  accuracy  was  obtained  for  a 
a ingle  speaker.  In  thr  case  of  twenty-five  word  pairs  from  the 
Diagnostic  Rhyme  Test,  a standard  vocoder  intelligibility  test, 
accuracies  ranged  from  78  to  88%. 

8.3  Analysis 

The  most  Important  considerations  in  a word-recognition  system 
are«  (1)  recognition  accuracy!  (2)  computation  time}  and  (3) 
reference  pattern  storage  requirements. 

Given  that  an  effective  distance  measure  exists  for  comparing 
short-time  power  spectra,  it  is  apparent  that  recognition  accuracy 
depends  heavi ly  on  the  reliabi I i ty  of  the  algorithms  chosen  for 
detection  of  the  beginning  and  ending  of  each  utterance  (endpoint 
detection)  and  for  proper  alignment  of  the  speech  sounds  between  two 
utterancee  (time-  warping).  Endpoint  detection  is  normally  done  by 
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comparing  the  energy  (zeroth  autocorre I at i on  coefficient)  of 
eucceeelve  speech  segments  to  some  threshold  value.  Since  fricated 
phonemes  often  form  uord  endpoints,  which  are  sometimes  the  only 
difference  between  two  different  words  (i.e.,  singulars  vs. 
plurals),  endpoint  detection  is  done  on  the  differencee  epeech 
signal.  Because  pouer  spectra  derived  from  fi  icated  phonemes 
normally  have  much  less  total  energy  than  voiced  sounds  (wni  le 
having  relatively  more  energy  at  high  frequencies),  the 
differencing,  which  emphasizes  high  frequencies  and  attentuates  low 
frquencles  [41,  allows  more  accurate  endpoint  detection  on  both 
voiced  and  unvoiced  sounds. 

The  non- 1 1 near  tlme-warping  algorithm  matches  epeech  sounds 
between  words  to  be  used  eo  that  variations  in  the  way  they  are 
spoken  does  not  affect  recognition  accuracy.  It  is  apparent  that 
the  earns  distance  measured  which  is  used  to  compare  short-time  power 
epect^a  between  words,  could  also  be  used  to  separate  speech  sounds 
ui  thin  a single  word.  When  this  is  done,  the  reference  pattern  can 
be  constructed  using  only  significantly  different  pouer  spectra, 
thereby  reducing  the  pattern  size  and  thus  the  amount  of  computer 
etorage  and  computation  necessary  for  pattern  matching.  Also,  sound 
eeparat'un  within  a word  can  bs  done  using  a louer  order  linear 
prediction  model  than  the  actual  reference  pattern  formation,  so 
that  considerable  computation  time  can  be  saved. 

Mother  desirable  feature  of  the  word-recognition  system  is  the 
capability  to  form  a reference  pattern  based  on  more  than  one 
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utterance  of  each  vocabulary  word.  Th i 9 enables  the  system  to 
follow  long-time  variations  in  word  pronunciation. 

The  eyetem  was  implemented  in  real  time  on  the  Univereity  of 
Utah’ e elngle-ueer  POP-10  computer. 

8.4  Method 

The  linear  prediction  method  of  signal  analysis  provides  a 
convenient  distance  measure  between  short-time  (20-30  msec)  all-pole 
power  spectra  12,3),  thereby  permitting  segment-wise  comparisons 
between  Isolated  words.  Itakura  (21  has  used  this  distance  measure 
to  attain  97. 3%  recognition  accuracy  on  a vocabulary  of  200  words. 

The  modoi  for  linear  prediction  analysis  assumes  that  each 
signal  sample  In  a segment  of  N camples  can  be  closely  predicted  by 
a linear  combination  of  the  preceding  p samples  for  p < N/2, 
thus: 

P 

s = E < s , + e n=0, 1 , 2 , . . .N-l 

n , . n-k  n 

k=l 

where  (akl  k-1,2,...  p are  known  as  the  predictor  coefficients  and 
[en3  n«!,2e...  N-l  is  the  prediction  error  at  each  point.  The 
predictor  coefficients  are  found  by  minmizing  the  I inear  orediction 

residual  (LPR),  defined  by 

N-i 

LPR  = E e . 

„ n 


.x 
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For  a given  eagment  of  speech,  the  prediction  error  sequence  le 

g I var,  by 


P 

e = s - E a,  s 
n n , , k n-k 

k=l 


For  the  same  set  of  predictor  coefficients  Ca^ 3 and  a difference  set 
of  speech  samples  Cs n]  n-1,2,...  N-l,  the  associated  prediction 

error  sequence  is  given  by 

p 

e = § - E a,  s , 

n n , k n-k 
k=l 

Therefore,  a measure  of  similarity  between  the  two  speech  segments 
represented  by  [sj  and  [sn]  is  given  by  the  "closeness"  of  the  two 
numbers 


I i 


N-l 

N-l 

-2 

E e 

and 

LPR  = E 

e 

n=0  n 

2 „=0 

n 

or  since  LPF^  < LPR2>  how  close  the  ratio 


, lpr2 


le  to  unity, 

It  can  be  shown  [33  that  this  distance  measure  represents  a 

2 

comparison  between  the  power  spectrum  |A(oj)|  of  the  inverse  filter 
defined  by  the  predictor  coefficients  [a^3  (extracted  from  the  first 
speech  segment)  and  the  power  spectrum  P(u>)  of  the  second  speech 
segment  Cen] , in  this  way: 


N-l  TT/T 

E e = T/2tt  / P(w)  ! A ( uj)  I du> 
n=0  n -n/T 
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where  T le  the  sample  interval. 
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Section  9 


Word  Recognition  in  Continuous  Speech 
Richard  U.  Chrietianson 


9.1  Introduction  and  Background 


There  are  rather  extensive  efforts  already  underway  in  the 
areas  of  automatic  speech  recognition,  speech  understanding,  and 
word  recognition  using  digital  waveform  processing  [1],  Even  a 
casual  reading  of  this  recent  overview  of  the  current  work  Mil! 
convince  one  of  the  magnitude  and  extreme  difficulty  of  the  general 
speech  recognition  problem.  Some  of  the  key  problems  associated 
with  this  work  arn 

1.  Segmentation  of  incoming  speech  into! 

a.  Phonemes 

b.  Sy I lables 

c.  Words 

2.  Defining  beginnings  and  endings  of  words 

3.  Use  of  syntax  and  semantics 

4.  Classification  of  speech  segments 

5.  Time  registration— usual  ly  nonlinear 


However,  the  work  described  in  this  section  is  fundamentally 
different,  and  is  directed  toward  achieving  a different  objective 
than  the  general  speech  recognition  work.  This  difference  allows 
one  to  choose  other  ways  of  solving  the  problem  which  el  iminate  r.ost 
of  the  difficulties  inherent  in  the  general  solution.  Of  the  f i vo 
problems  referred  to,  only  the  time  registration  problem  remains. 
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Hero  we  are  not  trying  to  identify  an  input  work  or  determine 
its  meaning,  but  merely  determlre  if  a reference  word  ever  occurs 
uelng  simple  waveform  matching  techniques.  This  technique  may  be 
described  ae  a time  flexible  template  matching  approach  in  which  the 
input  speech  is  analyzed  one  frame  at  a time,  and  time  registered  to 
the  stored  reference  template  using  a dynamic  programming  algorithm. 
At  the  completion  of  the  analysis  using  each  frame,  a test  ie  made 
to  dstermlns  If  th*t  frame  formed  the  last  frame  of  the  word 
correepond I ng  to  the  template  word.  A frame  similariiy  function,  a 
threshold,  a parameter  for  weighting  previous  delta,  and  a time 
distortion  penalty  constant  are  used  in  the  dynamic  programming 
algorithm  which  is  described  later.  Although  the  final  result  c<f 
this  effort  may  not  yield  a fully  automatic  system,  it  ehould  still 
produce  a practical  working  system  which  uses  some  human  interaction 
but  etill  accomplishes  the  main  objective. 

9.2  Objective 

There  were  three  main  objectives  of  this  research.  They  were 
ae  foil  owe: 


1.  Given  a recording  of  someone  talking,  i.e.,  a 
continuous  speech  recording,  and  a reference  examp  I 3 of 
the  same  person  speaking  a word  from  the  continuous  speech 
text,  then  Identify  whenever  the  reference  word  occurs  in 
the  recording  using  digital  signal  processing  techniques. 

2.  Provide  a statistical  evaluation  of  the 


I 
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performance  of  the  system,  an  explanation  of  why  it  works 
when  It  does  and  fails  when  it  does  which  can  be  related 
back  to  the  acoustic  model  for  speech,  a measure  of  the 
robustness  of  the  3ystem  which  characterizes  the 
sensitivity  to  "noise,"  small  changes  in  parameters, 
"quality"  of  recording  system  and  environment,  etc.,  and 
in  general,  advance  the  understanding  of  speech  waveform 
matching  using  LPC  parameters. 

3.  Provide  a preliminary  evaluation  of  the  system  for 
the  case  of  using  multiple  templates  both  from  the  same 
speaker  and  from  several  different  speakers. 

9,3  Qaslr  Word  Finding  Algorithm 

As  a possible  basic  approach  to  locating  a word  in  continuous 
text,  one  could  choose  to  segment  the  incoming  speech  into  a string 
of  phonemes,  classify  the  phonemes,  and  then  identify  the  word  based 
on  this  classification.  This  approach  has  ssveral  major  problems, 
among  which  the  most  serious  are  segmentation  and  classification  of 
phonemes. 

As  another  approach,  one  could  view  a word  as  a waveform  with 
certain  fundamental  structure  or  spsctral  shape,  and  then  design  a 
filter  matched  to  that  wav°  shape.  This  approach,  common  in  radar 
work,  is  called  the  matched  filter  approach.  The  problem  with  this 
approach  is  that  of  time  regi  stration.  That  is,  any  word  may  be 
spoken  at  a different  speed  each  time  it  is  spoken  and,  in  fact, 
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with  different  parte  of  the  word  spoken  at  different  speeds  each 
time. 


To  use  this  approach  one  needs  a measure  of  spectral  matching 
and  some  type  of  nonlinear  time  uarping  p,  ucedure  to  somehow  measure 
how  ue!l  the  template  word  matches  the  word  in  the  continuous 
speech. 

In  searching  for  clues  to  solving  these  problems,  it  was 
discovered  that  Bridle  [21  had  achieved  85  percent  eucceee  at  uord 
recognition  using  a 13-channel  vccodBr  and  a rather  clever  dynamic 
programming  scheme.  His  procedure  uses  a frame  similarity  function 
derived  by  performing  a cosine  transformation  on  a 19~point 
logarithmic,  short-term  pouer  spectrum.  This  yields  Borne  spectrum 
shape  coefficients  which  are  then  ueiyhted  using  empirically  derived 
weighting  coefficients.  A squared  distance  between  frame  a and  b is 
then  formed  and  transformed  into  the  frame  similarity  number  c so 
that  0 < c < 1.  He  then  performs  the  time  registration  using 
dynamic  programming.  To  do  this  he  defines  a local  similarity 
function  AR  which  gives  a measure  of  hot  similar  the  incoming  speech 
is  to  the  template  on  a frame-by-frame  basis.  The  local  similarity 
function  is  a weighted  sum  of  the  frame  similarity  functions  along 
the  registration  path  to  the  current  position. 

This  local  similarity  function  includes  a factor  which  applies 
a penalty  if  there  is  time  distortion  and  no  penalty  if  there  is 
not.  (See  Apppendix  A.) 
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There  were  a number  of  attractive  points  in  Bridle’s  work,  some 
of  which  are  listed  below. 

1.  The  dynamic  programming  routine  is  simple. 

2.  Any  other  frame  similarity  function  which  ie 
bounded  by  0 and  4-1  could  be  used. 

3.  It  performed  the  nonlinear  time  registration  with 
one  input  frame  at  a time. 

4.  It  worked  quite  well  using  the  channel  vocoder. 

It  was  therefore  decided  to  use  the  inverse  of  the  linear  prediction 
ratio  given  in  Equation  58  in  a memo  by  Boll  13]  as  the  frame 
similarity  function  and  to  apply  the  dynamic  programming  exactly  as 
given  by  Bridle. 

Now  the  following  things  should  be  noted  concerning  this 
a Igor  I thm: 

1.  It  is  a recursive  algorithm  which  computes  I max 
values  for  an  array  AR,  where  I max  ■ number  of  analysis 
frames  In  the  template. 

2.  SIM  Is  a function  which  describes  the  similarity 
between  a frame  of  the  template  and  the  incoming  frame. 

3.  AR  Is  a local  similarity  function  which  Is  really 
a weighted  sum  of  the  frame  similarity  functions  along  the 


path  to  the  current  po3ition. 
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4.  The  local  similarity  at  the  first  point  on  a path 
ie  defined  co  be  equ~l  to  the  frame  similarity  at  that 
point. 

5.  A time  distortion  penalty  ie  applied  so  ae  to 
allow  poorly  matching  frames,  providing  they  occur  between 
well  matched  regions,  and  to  provide  a control  parameter 
to  limit  or  reduce  the  contributions  due  to  "excessively 
large"  time  distortions- 

6.  At  the  beginning  of  the  process,  the  AR  array  ie 
zero.  Then  If  the  input  speech  begins  to  match  the 
template,  nonzero  values  begin  to  propagate  through  the  AR 
array.  If  ARdmax,  j)  j 0,  then  ARUmax,  j)  is  identified 
as  the  ending  frame  of  the  uord.  If  instead  the  input 
speech  begins  to  match  the  template  and  then  does  not 
match  wel  I for  succeeding  frames,  the  recursive  nature  of 
tha  algorithm  successively  sets  the  elements  of  ■ he  AR 
array  back  to  zero. 

7.  The  array  AR  contains  all  necessary  information 
about  previous  input  speech  and  how  well  It  has  matched. 

9.4  Current  Status  of  This  Research 

A preliminary  version  of  this  algorithm  has  been  implemented  on 
the  POP-10  system  and  limited  testing  has  been  conducted  using  words 
spoken  in  isolation  as  well  as  continuous  speech.  This 
implementation  ie  shjun  in  block  diagram  form  in  Figures  9.1  and 
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9.2,  with  the  algorithm  and  basic  Fortran  program  given  in  Appendix 
A.  The  program  runs  in  about  12  times  real  tima, 


For  preliminary  evaluation  purposes,  ten  utterances  of  each  of 
the  two  words  "octopus"  and  "carnival"  and  six  utterances  of  the 
word  "interrupt"  were  digitized  and  recorded  in  isolation.  In 
addition,  approximately  two  minutes  of  text  from  a PDP-11  manual  was 
digitized  and  recorded.  A single  occurrence  or  each  of  the  worde 
"program,"  "processor,"  and  "priority"  was  located  in  the  text  to  be 
used  as  templates  to  locate  other  occurrences  of  the  words. 


Frequency  of  occurrence  of  these  words  in  the  text  is  given  in 
Table  1. 


Table  1.  Word  list. 


Word 

Occurrence 

Intel rupt 

18 

Program 

10 

Processor 

7 

Priority 

6 

The  results  are  summarized  in  Table  2,  where  a!  I template  worde  are 
epoken  by  the  same  spsaker  as  the  "text." 


Horo  it  should  be  noted  that  the  length  of  the  interrupt 
template  ie  approximataly  .6  seconds  while  the  continuous  speech 
versions  were  as  short  as  .3  seconds. 
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In  contrast,  Bridle,  U9ing  templates  taken  from  the  ten 
reports  85  percent  hits  with  six  false  alarms  per  hour.  To  achieve 
this,  ha  used  as  many  as  sight  different  thresholds  for  each 
analysis  and  had  five  different  speakers.  At  this  point,  there  is 
no  realistic  basis  for  comparison  since  I have  no  reasonable  or  even 
comparble  statistics. 


In  addition  to  the  above  results,  a test  run  was  also  made  on  a 
recording  by  a second  speaks."  using  the  template  from  Ihe  first 
speaker  and  the  same  Q,  G,  and  RK  values.  The  results  were  18  hits, 
0 misses,  and  2 false  alarm3. 

Table  2.  Preliminary  experimental  results. 


Hits  Misse 


False  Alarms 
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All  these  prel Inary  re3ults  clearly  demonstrate  the  soundness 
and  feasibility  of  this  basic  approach.  The  preliminary  success 
using  a template  from  a different  speaker  also  suggests  great 
promise  for  solving  the  problem  using  templates  from  the  same 
speaker  or  from  several  different  speakers.  In  other  words,  it  may 
be  possible  to  construct  a 3et  of  templates  of  a word  spoken  several 
different  ways  by  several  different  speakers  which  wou'd  allow 
identification  of  thm  word  using  text  from  many  other  speakers. 
This  procedure  will  be  detailed  in  the  next  report,  along  with  the 
-osulte  to  date. 


p 
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Appendix  A 

Word  Finding  Aigori thm 

Let  a frame  eimilarity  function,  SIM,  oe  definod  by: 


where 

R.^  • autocorrelation  coefficient  matrix  for  a frame  of  uindoued 
■peach  aamplee  'or  the  template 

a^  • llnnar  predictor  coefficient  vector  for  a frame  of  the 
template  ep'jech 

a ^ * linear  predictor  coefficient  vector  for  the  incoming 
epeech  frame 

Thf/n  f;>r  the  dynamic  programming  algorithm,  we  can  define  a function 
AR  ( 1 , J)  ae  fol I owe: 

AR(i,J)  - max  StepKARti  - a,  j - b) , SIMCi.j),  K(a,b)) 

(a,b) 

AR ( I , j ) - max  [if  SIM(i.J)  < C,  then  0,  eiee  SIM(imJ), 

Step  (AR  ( I , j - 1),  SIMd.j),  RK)  3 

where 


Step  (AR,  SIM,  RK)  - I f AR  - 0,  then  0 


» > > 
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else  If  (1  - G)AR  + GRKSIM  < Q,  then  0 


else  (1  - OAR  + GRKSIM 


(a,b)  takes  values  (1,0),  (1,1),  (0,1) 


K(a,b)  takes  corresponding  values  RK,  1,  RK 


0 < Q < 1 


Q - threshold  for  similarity 


0 < G < 1 


G - previous  data  weighting  "time 


const1 


0 < RK  < 1 


RK  ■ time  distortion  penalty  factor 


I " 1 f J 
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Section  10 

Speech  Proceeeing  to  Remove  No  tee 
and  Improve  Intelligibility 
Michael  Callahan 


A new  method  of  tpeech  proceeeing  ie  being  Inveetlgated  which 
hae  applicatione  to  noise  removal  and  intelligibility  improvement. 
The  Method  io  based  on  homomorphic  processes  which  have  previously 
been  used  fcr  picture  deolurring  and  deresonation  of  acoustic 
signals.  The  basic  method  can  be  summarized  as  foilous: 


1.  The  short-time  spectrum  of  the  original  signal  is 
calculated.  The  short-time  spectrum  is  two-dimensional 
and  shows  the  frequency  content  of  the  signal  as  a 
function  of  time. 


2.  The  short-time  spectrum  is  altered  to  remove 
effects  due  to  noise  or  to  amplify  speech  features  using 
methods  such  as  thresholding  or  two-dimensional  linear 
f I I tering. 


3.  A new  acoustic  signal  ie  reconstructed  from  the 
modified  short-time  spectrum. 


This  type  of  processing  is  often  more  effective  than 
conventional  methods  because  of  the  Inherent  flexibility  of 
two-d Intentional  processing,  and  because  of  the  similarity  of  the 
method  to  processing  performed  by  the  human  auditory  system. 


Short-time  spectrum  processing  has  been  quite  successful  in  two 


initial  experimental  removal  of  background  noiee  (signal  to  noise 
ratio  about  30db) , and  removal  of  high  level  signals  with  strong 
harmonic  structure,  such  as  60Hz  square  wave  noise  (signal  to  noise 
ration  about  -26db) . Both  of  these  experiments,  which  were 
discussed  in  some  detail  in  the  last  report,  are  complete. 

Present  research  concerns  more  general  applications  of  this 
process.  The  flrot  step  la  to  develop  the  capability  to  extract 
from  the  short-time  spectrum  the  acoustic  features  of  speech  knoun 
to  bs  Important  io  perception.  This  ability  is  important  for 
several  reasonet  a)  it  will  give  a better  understanding  of  the 
effect  of  noise  removal  processes  on  the  underlying  speech;  b)  it 
may  provide  the  basis  for  pre-processing  speech  to  enhance  Important 
features  so  that  the  speech  will  be  more  intelligible  in  a no  ley 
environment;  and  c)  It  may  suggsst  improved  compression-expansion 
techniques  when  the  speech  must  bs  passed  through  a noisy  channel. 

An  example  of  this  process  is  shown  in  Figures  10.1,  10.2  and 
10.3.  Figure  10.1  Is  the  short-time  spectrum  of  the  words  "we  were 
auay  a uear  ago."  The  vertical  axis  is  frequency  (0-5000  Hz),  the 
horizontal  axis  Is  time  (l.S  sec),  and  brightness  is  proportional  to 
the  magnitude  of  the  short-time  spectrum.  Figure  10.2  shows  the 
speech  formants  for  the  words  in  Figure  10.1.  Formants  are 
prom  Inant  peaks  in  the  speech  spectrum  caused  by  resonances  in  the 
vocal  tract  for  the  speaker — the  frequency,  inteneity  and  bandwidth 
are  Important  to  intelligibility  and  naturalness.  The  formants  were 
obtained  from  the  sentence  of  Figure  10.1  by  filtering  the  logarithm 


i 


of  the  magnitude  of  the  short-time  npuctrum.  Figure  10. 3 shows  the 
original  sentence  after  the  formants  have  been  enhanced. 
Preliminary  experiments  indicate  that  speech  processed  in  thie 
manner  is  more  intelligible  in  a noisy  environment. 
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Linear  Predictive  Coding  with  Zeros  and  Glottal  Uave 

t.K.  Timothy 


11.1  Introduction 


The  quality  of  synthesized  speech  in  Linear  Predictive  Coding 
(LPC)  ••  presented  by  Atal  Cl] , euffers  for  at  least  two  reaeonei 

1.  Improper  treatment  of  the  glottal  pulse  or  wave. 

2.  Lack  of  zeros  in  the  all-pole  mathematical  model 
which  are  needed  for  nasal  sounds  and  radiation  effecte. 

For  comparative  purpoeee.  Figure  il.l  presents  a natural  voiced 
speech  uaveform,  while  Figure  11.2  presents  a corresponding  waveform 
syntheeized  by  the  Atal  method.  Figure  11.3  presents  a 
correspond) ng  waveform  synthesized  by  the  proposed  method  of  thie 
paper. 

For  speech  synthesis,  rather  than  exciting  an  all-pole  filter 
with  an  impulse  as  Atal  did,  the  filter  is  excited  with  an  estimated 
glottal  wave  sequence  modified  by  zeros  (E^}  as  indicated  in 
Equation  (1). 


s = E a . s . + E 
n . , l n-i  n 
1=1 


E b.  g 
. , l n-i 
i=l 
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The  b coefficient*  are  the  zero  coefficient*,  and  the 

i 

sequence  {g^}  i * in  assumed  glottal  wavs.  The  sequence  (Ej)  is 
actually  the  error  signali  however,  in  this  application  only  that 

part  of  the  error  signal  when  the  glottis  is  judged  to  be  open  is 

used,  the  remainder  set  to  zero.  Instead  of  transmitting  the  error 

eignal,  the  ti  coefficients  (or  coefficients  for  the  zero 
polynomial)  are  estimated  based  upon  an  assumed  glottal  wave 
sequence  {g^}  and  transmitted.  The  error  signal  is  recreated  in  the 
receiver  from  the  b *s  and  the  assumed  {g.}  • 


The  sequence  of  mathematical  operations  for  voiced  speech 
follows.  Unvoiced  speech  is  treated  as  recommended  by  Atali 

1.  The  Interval  of  time  during  a pitch  period  when 
the  glottis  is  judged  to  be  open  is  estimated  which 
requires  pitch  synchronous  information.  Figure  11.1 
illustrates  the  idea. 

2.  A ueighted  least  squares  estimate  of  the  predictor 
(or  reflection)  coefficients  is  made  based  upon  only  thoee 
portions  of  the  speech  wave  when  the  glottis  is  closed. 

3.  An  error  signal  is  formed  as  indicated  in  Equation 
(1)  using  actual  speech  wave  dat»>  {Sj}  . 


4.  The  zero  coefficients  b^  are  estimated  from  only 
that  part  of  the  error  signal  that  corresponds  to  when  the 
glottis  is  judged  to  be  open  as  illustrated  in  Figure 


11.4. 
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Although  zero  coefficients  and  the  error  signal  must  be 
calculated  In  addition  to  the  usual  Atal  calculations,  fewer 
predictor  coefficients  need  be  calculated.  Conaequently  no 
significant  Increase  in  the  number  of  computer  operations  is 
expected,  and  real  time  Implementation  is  expected.  Real  time 
synchronous  pitch  detection,  which  operates  with  few  errors,  wae 
reported  in  Reference  2. 

11.2  Zaroa  and  the  Glottal  Uava 

In  the  mathematical  model  of  the  epeech  wave  presented  in 
Equations  (1)  and  (2),  the  zeros  act  only  on  the  forcing  function, 
the  glottal  wav*,.  Consequently,  according  to  the  model,  the  zeros 
play  an  active  role  only  uhen  the  glottis  is  open.  The  zeros  only 
Indirectly  affect  the  ballistic  or  free  portion  of  the  epeech  wave 
when  the  glottis  is  closed  by  modifying  the  glottal  wave  which  in 
turn  adjusts  the  initial  condition  of  the  ballistic  portion  at  the 
point  when  the  glottis  closes.  The  error  signal,  therefore,  is 
approximately  the  glottal  wava  modified  by  zeros  of  thv  system. 

11.3  Glottal  Interval 

The  interval  of  time  during  ?,  pitch  period  when  the  glottis  is 
judged  to  be  open  terminates  at  the  absolute  maximum  value  of  the 
epeech  uave  as  indicated  in  Figure  11.1.  The  beginning  of  the 
glottal  Interval  is  judged  to  be  two  zero  crossings  prior  to  the 
terminal  point.  Flanagan  [31,  estimates  the  interval  to  vary  from 
30  to  70  percent  of  the  pitch  period  which  will  usually  be  different 
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from  thie  assumption.  However,  the  quality  of  the  eyntheeized 
epeech,  baeod  upon  the  above  aesumptione,  ie  very  good. 

11,4  Suppress  I on  of  Distorted  Data 

Iri  the  frequency  domain  analysis  of  epeech,  the  suppression  of 
unwanted  data  le  a very  difficult  problem.  By  utilizing  weighted 
least  squares  estimation  in  a time  domain  analysis,  the  suppression 
of  unwanted  data  becomes  a strai ghtforward  process  as  wi I I be 
demonstrated  below. 

In  LFC,  a linear  regression  analysis  using  least  squares 
estimation  is  used  to  estimate  predictor  coefficients  in  vector 
form. 


from  sampled  speech, 


The  leaet  squares  eetimate  of  the  predictor  coefflciente  can  be 


wri tten  as 


■ («\)  i 


i 


\ 
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where  the  H matrix  contains  sampled  data  ae  followei 


a - [Vl  i ~n-2  i •••  i Vp] 


The  HtH  matrix  ie  the  autocorrelation  of  autocovariance  matrix. 


The  weighted  least  squares  estimate  of  the  predictor 


coefflclente  can  be  written  as 


-1  .T  -1 


» / T -1  \"  HR  s 

a = (h  R H ) ~n 


where  R normally  would  be  the  covariance  matrix  on 


the  estimation 


e'-ror. 


R - E ee 


: r = s - £ a.s 

1 n i-i  1 n'x 


If  R -1  ■ I , the  Identity  matrix,  then  the  weighted  estimate, 
Equation  (7),  would  be  identical  to  Equation  (5),  the  unweighted 


east  equares  estimate.  If  one  chooses  to  throw  out  some  data  and 


retain  other  data  with  equal  weighting,  the  & 1 can  be  chosen  ae 


fol iowsi 


I * 0 1 0 

~1  I I 


— h-  — 


i r1  = o 1 o 1 o 

I I 


o . 0 I I,  etc. 


l i 


ft 


The  R * expression  in  Equation  (b)  is  diagonal  with  ones  or  zeros  on 

MTV 


the  diagonal . The  ones  correspond  to  the  data  to  be  retained,  and 
the  zeros  correspond  to  the  data  to  be  discounted. 

If  tho  H matrix  is  partitioned  to  correspond  to  Equation  (8)  as 
fol tousi 


H.  = 

the  autocovar ance  matrix  H H can  be  uritten  as 

hTh  = Ij,  + h*  ° Hg  + 53  h ^3  + ••• 

= hJ  H,  + H.  + etc.  (9) 

"1  '~1  ~'3  ~3 

T 

Equation  9 generally  requires  fewer  multiples  than  does  H^H  when  a I I 
of  the  data  are  retained  in  the  usual  LPC  situation. 

The  HTR_1  a terms  are  calculated  as  a subset  of  Equation  (9). 

— ■ — ~n 

11.5  Estimation  of  Zeros 

The  polynomial  coefficients  for  the  zeros  b^  can  be  deconvolved 
from  the  glottal  portion  of  the  error  wave  ee  represented  in 
Equation  (2)  as  follows.  Equation  (2)  can  bo  expressed  as  a system 


— **-.  **a>  ' 
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i 

of  equations  expressed  in  Equation  (10) « 


E 

n 

Vi 

gn-2 

gn-r  " 

v 

En-1 

gn-2 

gn-3 

— 

gn-r-l 

b2 

• 

t 

« 

(10) 

• 

E 

n-q 

gn-q-l 

gn-q-2 

• • • 

g 

• 

>_ 

L 

A glottal  wave  is  assumed  which  forms  the  g.^  elements.  As 
Equations  (1)  and  (2)  indicate,  p poles  and  r zeros  are  employed  in 


i 


L 


the  modal.  In  more  compact  matrix  form,  Eq.  (10)  can  be  written  ae 


E = Gb 


(11) 


where  g,  G,  and  ^correspond  to  the  matrices  in  Eq.  (10).  It  is 
aeeumed  that  q + J > r.  Consequently  least  squares  calculation  of 
the  zero  coefficient  may  be  used  which  is 


b = feTG)  GTE 


(12) 


S’nco  the  glottal  wave  shape  is  assumed  known,  the  elements, 
d...  of  D - {GTG}-1  GTcan  be  precalculated  and  stored  in  computer 

^ j ? w—  ^ 

memory  such  that 


b.  = E d.  . E. 
1 . , ID  D 

]=1  J J 


(13) 
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The  glottal  wave  is  assumed  to  be  a raised  cosine  wave. 
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Section  12 

Phaee  bat i mate  of  a Linear  Systeij| 
bg  Blind  Deconvolution 
Chian  H.  Chiang 

12.1  The  Problem  Description 

In  eplte  of  the  Ohw’e  Law  on  acoustics  ()),  many  studies  [2,31 
have  ehown  that  the  phaee  distortion  does  cauae  audible 
deterlorat Ion  of  speech  and  music  qualitg.  Ue,  therefore,  make  the 
hypothesis  that  the  residual  reverbrant  character i st i cs  in  an  old 
recording,  after  having  been  deconvolved  for  magnitude  compensation, 
ie  caused  bg  the  phaee  distortion  associated  with  the  old  recording 
technology.  Hence,  it  ie  possible  to  further  improve  the  audio 
quality  in  an  old  recording  by  proper  phase  compeneation  (or 
deconvolution)  if  one  can  aleo  estimate  the  phase  of  the  recording 
system  which  was  ueed  in  making  the  recording. 

Since  the  only  data  available  ie  the  phase  dietored  waveform, 
both  the  original  waveform  and  the  distorting  system  are  unknown, 
thie  task  of  estimating  the  phase  of  the  linear  system  can  also  be 
regarded  ae  "blind  deconvolution." 

12.2  The  Theoretical  Background 

The  recorded  signal  v(t)  is  related  to  the  original  waveform 
e(t)  by 

v ( t ) - e(t)  ©h(t) 

I 

k 
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where  h(t)  repreeente  the  linear  system  used  in  making  the 
recording. 

Taking  the  Frurier  transform  producing 

Vif)  - S(f)  • H(f ) (2) 

Further,  taking  the  complex  logarithms  of  both  sides  of  (2), 
one  obtains 

log  |V(f)  | • logj  S(f  )|  + log  |H(fl  | (3) 


and 


/V (f ) - /S(f)  + /H(f) 


(4) 


Equation  (3)  relates  the  magnitudes  and  (4)  represents  the 
relation  between  the  phases.  The  main  objective  here  ie  to 
estimate  /}H(f). 

One  approach  might  be  to  take  several  recordings  made  with  the 
same  recording  equipment,  calculate  the  phase,  put  each  one  in  the 
form  of  (4)  and  average  both  sides  of  these  equations  across  all  of 
the  recordings,  namely 

N N 

- t Zv.(f)  = - E *S.(f>  + LH(f)  (5) 


■ 


I 

I 


, 


recordinge  and  the  speech  or  signing  on  each  recording  were  quite 
different,  then  one  might  e*Dect,  according  to  the  central  limit 
theorem,  that  the  right  hand  side  of  (5)  will  converge  to  /_H(f). 
The  di f f i cut  tg  is  that  it  is  hard  to  corns  by  enough  recordinge  which 
are  Known  to  be  made  with  the  same  equipment.  The  way  to  get  around 
the  problem  is  tc  chop  up  one  recording  into  sections  and  take  the 
phases  of  these  sections  as  the  ensemble  on  which  to  carry  out  the 

[ I 

averaging.  Thus,  it  might  bs  possible  to  estimate  the  phase  of  the 
linear  system  by  a three  step  process:  1)  average  the  phase  of  the 
dietorted  waveform:  2)  average  the  phase  of  a similar  acoustic  wave 
which  hae  not  been  phase  distorted  (probably  by  using  a good 
recording  of  "identical"  speech  or  singing);  and  then  3)  subtracting 
between  the  two. 

12.3  The  Difficulties  Encountered 

Two  difficulties  arise  when  one  tries  to  implement  the  process 
described  above: 

a)  The  Phase  Unwrapping. — The  digital  computer  utilizing  the 
conventional  four  quadrant  inverse  tangent  function  routine  computes 
only  the  principle  value  of  the  phase.  Unfortunately,  the  sum  of 
two  principle  values  doss  not  give  the  principle  value  of  the  eum. 
So  one  has  to,  somehow,  find  the  actual  phases  before  he  averages 
them.  An  algorithm  by  Schafer  [4]  is  to  decide  the  number  of  jumps 
of  2 it  so  that  one  can  obtain  the  actual  phase  from  its  principle 
value  veraion.  The  process  is  called  phase  unwrapping  and  both 


\ 
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Schafer  and  Oppenhelm  ISi  have  discu^eJ  this  topic  elsewhere.  The 
phaea  unwrapping  echeme  ofy  achafer  ^1  I not  work  when  there  are 
sudden  energy  dlpe  In  the  spectrum.  Unfortunately,  this  is  just  the 
caae  when  the  ties  waveform  is  periodic,  as  is  true  for  speech.  The 
dlpa  cause  the  spectrum  to  become  "discontinuous"  in  some 
frequencies. 

To  gat  the  beet  unwrapping  result  using  Schafer’s  method,  the 
phase  hat  to  ba  computed  from  a data  frame  so  small  that  the  fine 
detail*  (the  energy  dips)  will  not  appear  in  the  spectrum. 


b)  The  Reference  Points  in  the  Time  Waveform. — Many  people  have 
noticed  that  a slight  shift  in  the  relative  position  of  the  window 
with  reepect  to  the  waveform  causes  a tremendous  change  in  the 
unwrapped  phase  (Gl . This  phenomenon  is  undesirable  especially  when 
the  etatletlc  property  of  the  phase  is  of  major  concern. 


To  circumvent  the  effect  of  this  change,  we  "synchronize"  the 
waveform  prior  to  the  taking  of  FFT  in  such  a manner  that  the 
waveform  will  aiuays  have  the  peaks  located  at  the  beginning  point 
of  the  FFT  array.  In  other  words,  we  take  FFr  in  a synchronous 
faehlon  with  reepect  to  the  peak  oT  the  time  waveform. 


It  turne  out  that  using  this  synchronization  scheme,  the  phose 
calculated  becomes  statistically  stabilized.  Thus,  we  hope  to  avoid 
thts  bad  etatletlc  behavior  noted  by  other  researchers. 


12.4  The  Experiment  and  The  Result' 
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An  Inltlfl'  experiment  was  carried  jfyjt  to  Illustrate  the  method 
cf  eetimatfng  the  phase  of  the  linear  system  given  only  the 
distorted  waveform. 

A epeech  passage  by  the  male  subject  number  1 readirq  from  a 
text  le  digitally  recorded  and  the  convolved  with  an  all-pase 
digital  filter  which  simulates  the  phase  distorting  process  in  tne 
old  recording.  The  phase  of  the  filter  was  designed  to  have  a 
linear  ramp  phasy  dispersion  with  the  maximum  value  of  3 ms  at  the 
nyquiet  frequency.  This  distorted  speech  was  used  as  If  it  were  the 
only  data  available  with  which  to  etart  with  the  estimation  process. 
Processing  this  data  yields  the  left  hand  (first)  term  of  (5). 

In  order  to  reveal  the  second  term  on  the  right  hand  eide  of 
(5),  one  needs  an  isolated  version  of  the  first  term  in  the  right 
hand  eide.  To  supply  this,  a second  speech  passage  is  recorded  by 
the  male  subject  number  2 reading  from  the  same  text  as  the  previous 
recording,  and  this  recording  is  regarded  as  the  prototype. 
Processing  of  this  data,  then,  yields  the  second  term  of  (5). 

The  processing  yielded  the  average  phase  of  both  recordings,  eo 
the  estimate  of  £H(f)  was  obtained  by  subtraction.  The  assumption 
made  here  ie  that  the  average  phase  of  the  prototype  recording  has 
the  same  mean  value  as  does  the  original  speech  to  be  restored.  The 
estimated  and  the  actual  phases  (a  parabolic  curve)  of  the  eyetem 
are  compared  in  Figure  12.1. 


12.5  Discussion  and  Future  Work 
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The  result  Is  still  of  little  practical  applicability  for  the 
following  reasons:  1)  It  ie  notjjtelear  what  caused  the  estimated 
phase  of  the  system  to  roll  off  from  the  actual  value  in  the  higher 


frequency  portion  as  shown  in  Figure  12.1;  2)  Since  the  length  of 


the  data  window  has  to  be  limited  to  obtain  better  unwrapping 


result,  the  amount  of  detectable  phase  dispersion  is  also  limited  to 


Just  a few  ms.  This  means  the  forementionsd  process  will  not  yield 


a correct  estimate  of  system  phase  if  this  phase  dispersion  were 


more  than  about  5 ms. 


The  future  research  will  be  devoted  to  finding  a way  to 


calculate  the  reliable  unwrapped  phase  without  the  restriction  of 


using  small  windows.  If  such  is  shown  to  be  practical,  the  method 


will  be  applied  to  appropriate  test  cases. 


ft/ 
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Section  ^ 

Other  ^ 

Final  Computer  Science  Department  technical  reports  bg  Rom 
(UTEC-CSc-75-115)  and  Ingebreteen  (UrEC-CSc-75-118)  are  In 
preparation. 

A report  entitled  "Cepstral  Prediction  Analysis  of  Digital 
Waveforms,"  by  All  Atashroo,  was  presented  at  the  IEEE  75  Region  Six 
Western  Conference.  This  report  describes  the  continuing  attempt  to 
Incorporate  zeroe  ae  well  as  poles  into  a spectral  model  to  match  a 
given  spectrum.  Knowing  the  impulse  train  response  of  a time 
invariant  digital  filter  with  a rational  system  function,  the  paper 
describes  a proceee  that  identifies  the  digital  filter  in  terms  of 
the  location  and  order  of  Its  poles  and  zeroes. 


PUBLICATIONS  AND  PRESENTATIONS 
Sensory  Information  Processing 
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SYMBOLIC  COMPUTATION 
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This  report  summarizes  the  work  of  the  group  during  the  period 
of  January  1,  1975  through  June  30,  1975.  The  Group’s  research  la 
directed  toward  the  development  of  software  techniques  for  the 
eoiution  of  a wide  range  of  symbolic  and  algebraic  problems. 


During  the  last  six  months,  the  work  of  the  group  hae  been 
directed  into  the  following  areas: 


Section  1 

Development  of  a Mode  Analyzing  Algebraic 
Simplification  Program 


In  the  laet  two  semi-annual  reports  we  described  a new  mode 
analyzing  version  of  REDUCE  and  our  progress  towards  ite 
Implementation.  A basic  REOUCE  system  has  been  completely 
implemented  ueing  these  ideas  of  a complete  mode  analysis  following 
parsing  and  only  some  aspects  of  the  pattern  matching,  vector  and 
high  energy  physics  packages  rema'n  to  be  checked  out  before  a 
complete  REOUCE  system  can  be  released  locally  for  extensive 
experimentation  and  testing. 

The  Implementation  of  a user  data  definition  facility  ie 
sufficiently  complete  to  permit  all  the  features  of  a rather  simple 
type,  such  as  COMPLEX,  to  be  defined  quite  easily.  This  includes 
coercions  to  and  from  other  types,  default  values,  printing 
functions,  and  all  the  basic  operations.  A completely  satisfactory 


\ 
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definition  of  a recursive  type,  such  as  a polynomial,  le  almost 
within  our  grasp,  but  still  rt>quires  some  classification  of  the 
UNION  mode.  ' ^ 

He  believe  that  a completely  working  version  of  REDUCE, 
including  an  integrated  data  definition  facility,  will  become 
available  during  the  next  six  monthe,  and  we  will  then  concentrate 
on  rewriting  REDUCE  using  the  power  of  the  new  system,  and  extending 
the  data  definition  mechanism. 

Section  2 

Research  in  Independent  Comp  I ler  and 
Interpreter  Design 

The  development  in  th i 3 area  over  the  past  six  months  has  been 
particularly  exciting.  As  reported  previously,  we  hava  extensively 
rewritten  and  improved  the  basic  LI SP/3S0  compiler,  isolating  the 
code  generation  into  the  production  of  1/  fairly  machine  independent 
macros.  The  latest  distributed  version  of  REDUCE  for  the  IBM 
360/370  series  includes  this  compiler,  written  entirely  In  REDUCE 
(11.  Recently,  a more  sophisticated  version  for  the  sume  compiler 
hae  been  able  to  produce  executable  code  for  the  PDP-10,  after 
rewriting  the  macros.  The  cods  produced  compares  extremely  well 
with  the  existing  LISP  1.6  compiler  121.  In  the  last  few  weeks,  we 
have  Indications  that  we  can  p-oduce  even  better  code,  aid  all  the 
optimizations  performed  should  reflect  Immediately  in 
correspondingly  better  code  for  the  IBM  machines  as  well.  We  will 
in  the  next  few  months  try  this  compiler  on  the  360  to  confirm  the 
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effect Ivenees  of  the  machine  independent  optimizations,  and  have 
also  bogun  an  Implementation  of  the  macros  for  the  UNIVAC  1108  LISP 
system. 

The  work  on  portable  LISP  Interpreters  is  progress'!  8g 
eatlefactorl ly,  with  a complete  pseudo-code  model  now  executing  on 
the  PDP-10  131.  This  work  is  being  written  up  as  a report,  and  then 
an  attempt  to  transfer  the  program  to  a different  machine  will  be 
ini tiated. 

A eecond  approach,  using  the  ideas  of  abstract  machine  modeling 
[41  , ehould  ultimately  lead  to  a portable  LISP  system  that  can  t-v.e 
full  advantage  of  the  target  machine,  and  so  produce  a faster  eyotem 
than  the  pseudo-code  interpreter  model.  An  initial  implementation 
should  be  completed  during  the  next  six  month  period.  The  Ideal 
portable  LISP  eystem  would  probably  include  both  pseudo-code  (for 
Initial  rapid  implementation  and  code  packing  efficiency),  and 
abstract-machine  code  (for  critical  or  extremely  machine  dependent 
featuree  euch  as  I/O)  to  produce  a system  that  executes  with 
reasonable  speed,  and  a wide  range  of  capabilities  on  even  quite 
email  machines. 

Sect  on  3 

Sparse  Matrices  and  Factored  Polymonlal  Algebra 

He  have  continued  improving  the  programs  described  in  the 
previous  reports,  with  particular  emphasis  on  replacing  the  expanded 
polynomial  representation  by  factored  polymoniale  in  the  different 


matrix  eolution  methods. 


The  Bare  I ••  (gauselan  el  Imlnation)  method  gains  significantly 
In  speed,  as  ueil  as  producing  a factored  result.  The  Miner 
Expansion  method  takes  much  longer,  losing  its  former  advantage. 

The  minor  expansion  method  involves  many  more  algebraic  operations 
constructing  many  terms  that  are  discarded  before  the  final  result. 

Using  the  expanded  representation,  most  of  these  operations  involvo 
simpler  polynomials  then  in  the  Bareiss  caee.  compensating  for  the 
increased  number.  The  factored  representation  seems  to  involve  a 
different  coet/num^sr  tradoff,  which  we  are  studying  further.  One 

1 

method  of  improving  the  minor  expansion  method  is  to  delay 
evaluation  by  constructing  a "formal"  exprseelon  tree.  The  excess 
terms  will  drop  out  before  the  final  tren  Is  evaluated.  Uelng  this 
scheme,  we  can  obtain  the  result  in  only  slightly  more  time  than  the 
Bareiss  method.  This  "formal"  representat i on  has  other  advantages 
and  we  plan  to  etudy  how  the  advantages  of  each  representat i on 
(expanded,  factored,  formal)  can  be  exploited  at  the  appropriate 
time  In  large  calculations  such  as  these. 


Section  4 

Algebraic  Applications  Packages 

A number  of  extremely  useful  algebraic  packages  have  been 
completed  and  documented  within  this  period.  The  first  is  a fairly 
pouerful  pattern  machine  symbolic  integrator,  capable  of  handling  a 
very  large  class  of  common  integrals,  with  polynomial,  rational, 
tr igonomentric  and  other  integrands.  This  package  comblnee  a number 
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of  interesting  techniques,  and  ui  1 1 provide  a "built-in"  integration 
facility  previously  lacking  in  REDUCE.  The  package  is  easily 
extensible  with  the  addition  of  neu  patterns  and  uiil  also  provide 
an  interface  to  other  more  afflcient  but  specialized  methods  of 
integration  IS] . 

Another  package  of  some  interest  is  a program  for  solving  a 
variety  of  equations  in  finite  terms.  This  capability  is  embodied 
in  a general  purpose  "SOLVE"  command,  that  claseiflee  the  equation 
as  uhether  it  has  one  or  mors  unknowns,  whether  or  not  it  is 
non-linear,  and  whether  or  not  it  is  simply  polynomial,  or  involves 
radicals  and  transcendental  functions.  By  repeated  application  of 
algebraic  identities,  square  free  factorizations,  and  reduction  of 
the  equation  to  knoun  linear,  quadratic  or  cubic  forms,  the  program 
ie  able  to  solve  a considerable  number  of  quite  complex  squat  lone. 
It  also  provides  another  convenient  interface  to  the  linear  equation 
solving  facilities  in  ths  system  and  indicates  the  need  for  further 
extensions,  such  as  tne  solution  of  systems  of  non-linsar  polynomial 
aquations  161, 

fhe  previous  tuo  packages  were  developed  as  tools  to  be  used  in 
an  integral  equation  package,  designed  along  the  line  of  more  common 
differential  equation  packages.  This  package  attempts  to  classify 
ths  input  equation  into  one  of  a number  of  ciasses,  and  then  appl  ies 
a battery  of  techniques,  some  exact  and  some  in  the  form  of 
"analytic"  approximations. 
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While  the  classification  methods  mag  often  Indicate  that  an 

* 

equation  Mill  rapidly  yield  to  a particular  method  (e.g.  Laplace 
Transform,  or  Separable  Kernal),  others  can  only  occasionally  *re 
solved  exactly  and  some  kind  of  series  expansion  .till  at  best  be 
able  to  give  some  feeling  for  the  behavior  of  the  solution  near  It  e 
singularities,  and  dependence  on  input  parameters,  host  of  the 
exact  methods  lead  to  complicated  algebraic  equations  to  solve, 
while  the  series  methods  (Neuman,  Taylor,  or  Fredholm)  required 
repeated  analytic  integrations  or  differentiations.  In  many  cases, 
the  package  Is  limited  by  the  current  limitations  of  the  SOLVE 
command  or  the  Integration  package. 
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Over  the  past  six  months,  much  of  the  work  of  the  past  eeveral 
yeare  has  been  consolidated,  focused  by  the  implementation  of 
eeiected  techniques  on  the  nsu  POP-11/45  based  facility.  The  baeie 
f'sr  promising  new  research  efforts  involving  the  results  of  graphics 
wc~k  hae  been  defined. 

Section  1 

Complex  Scene  Image  Generation 
Martin  E.  Newel  I 

The  properties  of  procedure  mode  I e ae  applied  to  the 
representation  of  three  dimensional  objects,  for  the  purpose  of 
synthesizing  imagee  in  the  form  of  shaded  pictures,  have  been 
Investigated.  It  hae  been  shown  that  procedure  models  facilitate 
th*  processing  of  scones  of  far  greater  comlexity  than  has  proved 
practicable  ueing  data  base  modeling  techniques  alone.  The 
generality  and  flexibility  of  procedure  models  has  enabled  a system 
to  be  implemented  which  can  be,  and  has  been,  incrementally  expanded 
to  accomodate  new  model  formulations.  Examplee  of  output  generated 
by  the  system  are  shown  in  Figures  1.1  and  1.2. 

It  is  believed  that  the  benefits  of  procedure  models  are  not 
confined  to  the  field  of  image  synthesie,  but  have  coneiderable 
roiavanco  In  many  areas  where  modeling  of  three  dimensional  objects 
Is  of  concern,  such  as  computer  aided  design,  computer  aided 
manufacture,  stress  analysis  and  dynamics  simulation. 
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Section  2 

Measurement  and  Analysis  of  3-D  Scenss 
Henry  Fuchs 

Dofcrlbad  are  tha  dacign  and  implementation  of  a new 
rango-meaeuring  seneing  device  and  an  associated  software  algorithm 
for  constructing  surface  descriptions  of  arbitrary  three-dimensional 
objects  from  single  or  multiple  views. 

The  seneing  device,  which  measures  surface  points  from  objects 

I te  environment,  ia  a computer-controlled,  random-access, 
triangulating  rangefinder  with  a mirror-def lected  laser  beam  and 
revolving  disc  detectors. 

The  algorithm  developed  processes  these  surface  points  and 
gee  aretes,  in  a deterministic  fashion,  complete  surface  descriptions 
of  all  encountered  objects.  In  its  processing,  the  algorithm  also 
detects  parte  of  objects  for  which  there  is  insufficient  data,  and 
can  supply  the  sensing  device  jith  the  control  parametsrs  needed  to 
succeseful ly  measure  the  uncharted  region^. 

The  resulting  object  descriptions  are  eui  table  for  use  in  a 
number  of  areas,  such  as  computer  graphics,  where  the  process  of 
constructing  object  definitions  has  heretofore  been  very  tedious. 
Together  with  the  sensing  device,  this  approach  to  object 
description  can  be  utilized  in  a variety  of  scene  analysis  and 
pattern  recognition  applications  which  involve  Interaction  with 
"real  world,"  three-dimensional  objects. 
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Section  3 

Real-Time  Measurement  of  3-0  Positions 
Larry  Evans 


Test  equipment  is  currently  being  acquired  and  interfaced  to 
the  PDP-11/45  computer.  It  is  expected  that  the  summer  Mill  see  the 
completion  of  tests  needed  to  determine  the  feaelbility  and 
etructure  of  an  advanced  position  measuring  device. 


Section  4 

Advanced  Image  Quality 
Frank  Crou 


The  problen  of  aliasing  in  pictures  displayed  on  a discrete 
array  of  picture  elements  has  received  continuing  attention.  A 
model  of  the  effects  of  aliasing  has  bsen  developed.  The  two  main 
effects,  namely  loss  of  small  detail  and  staircase  edges,  have  been 
almost  ertirely  removed  in  carefully  designed  test  situations.  The 
application  of  the  developed  understanding  to  existing  visible 
surface  algorithms  is  the  subject  of  the  efforts  during  the  eummer. 


Section  5 
PDP-11/45  Facility 


Apart  from  some  problems  of  secondary  importance,  the  frame 
buffer  has  been  successfully  functioning  for  six  months.  Only  color 
television  output  has  been  used.  Problems  with  the  existing 
precision  display  have  thus  far  precluded  Its  successful  use,  but 


that  situation  will  be  resolved  in  the  near  future. 
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